Posts

Siesta of YDP

Unfortunately, I've had to take a siesta in Your Data Place. I'm still extremely passionate, but most of my spare time has been consumed by a half-work half-play project which is niche in my industry which will allow me some more future freedom if it ends up working correctly. I'm using some of the innovations I've outlined here in that project, to make it something completely unique and separate to the current competitors. It's in the pharmaceutical space, and I may start to think through concepts in this blog, as some of the ideas are quite abstract but need to be represented simply and easily to the users. In the time since I've last written, I have not learnt Rust to a proficcient level. I've learnt Firebase and become better at frontend web design as well as better at software application building using Python. I will return to Rust, and YDP, as these are my raw passions. I believe my experience in this next project will greatly inspire more ideas in YD

The start: a week of Rust

Image
I've taken a small break to focus on my primary work obligations, and in that time I've had lots of time to ponder. During this time, I flew to Tasmania (737-800) and read through a few preliminary chapters of Manning Publications'  "Math for Programmers"  (Paul Orland). This book is a real workhorse for programmers who want and know  they wish to be able to implement more mathematics into their code, but find the conventional teaching methods difficult to fast-track. Home now, I plan to spend a week giving Rust a real go from the other Manning Publications book, "Rust in Action"  (Tim McNamara). In this time, I'll learn everything with a noble aspiration: "How could this benefit the users of YDP?" (as well as my primary work, and other interests of mine like genetics). Rust is fast, but from what I've experienced, it's cracked and goated with the sauce. What I find interesting about Rust, is the author's intent is as clear as d

The Petabyte Project, the Most Valuable Idea

Image
I'm setting a goal: I want to be able to iterate through 1 petabyte worth of data in a few minutes. 1 petabyte, 1,000,000 gigabytes, all iterated through in 5 minutes. Per-gigabyte, that's a total allocated time of 0.0003 seconds per-gigabyte required to pull this stunt off... That's O(1) time! Although this is a ridiculous goal, I'm setting my optimism up to be crushed so we can have some fun and experiment with indexing and caching technologies. No caveats. This is my goal, but it doesn't have to succeed! Note that this article/idea is evolving, and I'll be editing/appending to the content of it over time. So please follow and watch out for new updates as I progress on this problem. Where did this idea come from? As I was driving along the highway to the beautiful Nambucca Heads in the Mid-North coast of Australia for a meeting, I had the windows down, sharp sunglasses on, and a rivetting DataBricks lecture playing . During this lecture, they talked about the

"YDP Podcast": The Worst Idea Yet

A "how to become a billionaire" capitalist herding technique I plan to start making a semi-structured, timed, pathetic excuse of a clickbait podcast some time in the next week. This is for another community to engage with (YouTube), and for a more casual/entertaining format that I enjoy being in. I'll cover changes, walk through the UI, and also video some of the things I write about. I don't plan for it to be anything, I just want another way to communicate my ideas and brainstorm in real time. For fun and to make it attractive to the watcher, I plan to context it (because I believe entertainment is big on context, colors, etc) with a "how to build a trillion dollar apple clone empire" type of feel, and plan to make it as candid as possible. I want to both talk about my great experience with Docker, but also make fun of it for being a terrible platform. It's alive! Edit: the first episode is live ! Go check it out as I give a great (great because there&

Brainstorm: Folders as APIs (FAP-EYES)

FAP-EYES is not the most commercially viable name, but the concept underlying it is very promising. I've come up with the concept for project files to have "attributes", but I didn't have any ideas for how folders could have "attributes". To recap you, what are file attributes?  Recap: file attributes File attributes are basic flags that can be applied to files which tell YDP that "this file does something special", and allow a bunch of functionality to be automatically inherited by that file. I plan to offer a nice blend between abstract attributes, as well as specific attributes which developers often commonly use. For example, below are a couple of attributes I've brainstormed: Dataset creator - a file which produces a dataset, which can then be globally imported elsewhere. Common - a file which is to be accessible from other projects, and easily invokable without worry about dependencies or Python version. Transformation - a file which tra

Shareable data and code in YDP Projects

Image
Datasets, everywhere... One major ability I want in YDP is the ability to create datasets that can be interpreted as global, reproducable datasets which were created as part of a dataset generator. Although I presume datasets will primarily be used inside their own projects, I think there's great value in the ability to make these datasets global, and accessible by any other project. The Problem At the moment, the problem is in the implementation. This was no issue when YDP was a sandbox test project in my local filesystem, where I could easily WRITE a file here, then READ a file here under the same name. But, in a real production architecture, we're using Docker as of this moment to manage our independent filesystems. When it comes to being able to have a dataset be generated in Project 1's file system (FS) then wants to be used in Project 2's FS, what has to occur is something along the lines of the following: Flag a Python script as a generator for a dataset. Execute

Your Data Place Projects Migrate to Docker!

Image
Docker is now housing YDP projects (datasets coming soon) I jumped the gun a bit (after only learning Docker a couple days ago) and fully migrated Your Data Place projects to containers. This has been an incredible idea-driver for me, realizing that a lot of the work I'd done for integrations, and other concepts (like easy databases) can all be done from within Docker containers to rapidly provide features. This means the execution runtime, project storage, project editing, common SDK work, integration management, and everything, can be all done inside of containers and heavily utilize the containers. This allows incredible security to myself, and allows me to constrain the commands to a small subset to radically reduce the attack vectors for YDP. Containers-as-hosts One thought I had further was the ability to have the entire UI be container-driven. A way this would work is you create a project, then get directed to a URL which is being hosted/forwarded through the domain. I'm