Drawing Board 1 - Do I build a file system?

This is a quick review and series of revisions after doing the first sprint on Your Data Place, and playing around with integrations and SDKs. So... what is YDP (Your Data Place)?

YDP at the core started as a way to store your old datasets for quick reuse and agile data discovery and analysis. You could (ideally) take your datasets, drop them into YDP, then write lambda-like functions which would execute them seamlessly and provide results and cultivate an environment of experimentation.

From there, more ideas jumped out of the page: iterative datasets for managing big data easily, blending data, expanding columns on data using logic (like converting addresses to latitude longitudes), easy integrations with Google APIs which I use so often (Sheets/etc), transforms vs dataset creators, uploading dependencies through requirements.txt or conda.deps, timer logic for expressing complicated timers in code, and the before-mentioned "code being the best form of expression for data", and so much more.

This is all great, and I still find the ideas extremely valuable and implementable, but I'm suffering from the thought that it needs more. Sure, it solves my problems, but I have this itch that there's a meta-problem to be solved which sits outside of the problems I solve existingly, of which will solve and streamline my workflow to generate more value in my professional career.

One big decision I've been trying to decide on is as-to whether the client UI will emulate a full file-system (FS), which is both a huge positive but also seems like a massive negative in-that the app will become more of a cloud IDE than a data platform (in my head). One implementation of this which seems to sit outside of the IDE side is indeed allowing a FS to exist and allow one to build code that executes in this FS context, but build meta-scripts which are the current "dataset creator" scripts which utilize this file-system.

Another illustration would be that users can create SDKs or commonly recycleable "scripts", which when they attempt an import, are brought into the project and referenceable. This would streamline jobs down into 3-10 line scripts (transformations, jobs, or dataset makers), of which all the hard logic is developed in a real IDE then provided.

There feels like no need to reinvent the wheel, and I definitely have no passion for IDE engineering. Data, on the other hand...

Comments

Popular posts from this blog

The Petabyte Project, the Most Valuable Idea

Shareable data and code in YDP Projects

Siesta of YDP