Data Prep Ops: The Missing Piece of the ML Lifecycle Puzzle
The 2010s will likely be remembered as the decade when 60 years of ML research finally blossomed into a long awaited surge in real-life AI applications. Nonetheless, our delay in standardizing and establishing what we now call the ML lifecycle as an inherent part of the development of ML applications brought us yet again eerily close to another AI Winter.
And still, in spite of a recent boom in better ML Ops tools to make the deployment and management of ML models in production easier, the alarming number of industry leaders complaining about the disappointing ROI of their AI initiatives shows that we are definitely not out of the woods yet. So how come that all those recent trailblazing developments in both hardware and ML research still fail to give us mainstream autonomous driving and so much more? What are we doing wrong?
In her talk, Jennifer will dig into how the deficit of attention from experts to one of the most critical areas of the ML lifecycle – that of data collection and preparation – is the likely cause for a still highly dysfunctional ML lifecycle. According to her, while general wisdom acknowledges that high quality training data is necessary to build better models, the lack of a formal definition of what constitutes a good dataset – or rather, the right dataset for the task – is the main bottleneck impeding the universal adoption of AI. She will explain how the concept of what she calls, “Data Prep Ops‚” can change the game, and how to better incorporate concepts such as active learning, human-in-the-loop ML and strategic data collection as integral parts of the ML lifecycle.