Data Pipelines

Enterprise data rarely exist in the exact form or format required by data scientists for a given project. Rather, raw data must be processed through a series of transformations in order to cleanse and normalize it before it can be used for training. Once a model is put into production, this same sequence of transformations must be applied to the data to ready it for inference. Early in the exploratory phase of model development, these transformations are often applied in an ad hoc manner. However, manual transformations should give way to programmatically executed transformations very quickly, as the former are highly error-prone, not readily repeatable, and don’t scale.

