"The First Node" of Your Data Place (Live Updated)

What I want - 6th of June, 2022

This is an article describing my progress in developing a data platform first for myself, but then for others. It will first address my common individual issues as a data engineer and pipeline manager, but also the groups I work with and the future work I end up doing. This platform is (so far) built on the following principles:

  • Code is the most expressive way of managing data.
  • Python is one of the easiest languages to express ones self with (at this time).
  • An online platform should have little to no compromises to runtime execution.
From that, we will build a data platform.

What I want - 7th of July, 2022 (31 days later)

After writing a lot of content on Your Data Place and publishing a few ideas, I've come to want a few more things out of the platform. These are not feature requests, but rather experience requests for how I'd like to operate this platform myself. See below:

  • A platform where I can write-out something I'm learning, then easily import it into old/new projects. For me, this is specifically for machine learning. I've got that many old half-baked ML project folders that sit somewhere on multiple of my PCs, that go from linear regression to attempting to use Keras for image classification - but as time goes on, I forget them. One of my goals this year will be to start learning Keras from the ground-up again (using this handy book I brought a while back) and make sure the Your Data Place runtime can cope with ML.
  • A Python SDK (module) to integrate cloud runtimes with local runtimes by using data. Using this logic, I'd be able to write local scripts that leverage (keyword) my YDP runtimes to qucikly experiment in new projects, if I don't have the time to setup a new YDP project.

Comments

Popular posts from this blog

The Petabyte Project, the Most Valuable Idea

Shareable data and code in YDP Projects

Siesta of YDP