Play Video

Data-Centric AI: Why This Trend Is Here To Stay

“Paradoxically, data is the most under-valued and de-glamorized aspect of AI.”– Google Researchers

To scale deep learning adoption beyond massive consumer internet use cases, the usual approach of hand-labeling millions of training examples is untenable for a variety of reasons, including high labeling costs and lack of available data. 

Data-centric AI seeks to help organizations build “good enough” AI systems at a reasonable cost by encouraging a focus on the systematic creation and curation of high-quality training data, as opposed to developing novel model architectures and endlessly tweaking hyperparameters. While this latter  approach works well in research, academia, and Kaggle competitions, due to their emphasis on measuring performance against fixed benchmark datasets, it has largely failed enterprises, which struggle to deliver viable models for many of their long-tail use cases. 

With a data-centric approach, industries that aren’t typically synonymous with AI-driven applications can use AI to drive innovation by developing systematic engineering practices for improving training data.

In this discussion, we explore how data scientists and ML/AI practitioners can apply data-centric AI ideals to solve their real-world machine learning problems.