Data Versioning

Achieving the highest levels of reproducibility and the ability to forensically analyze deployed models and their decisions requires being able to go back in time to determine the exact state of the training data at the time a model was trained. Data Versioning is a critical element of Data Lineage. Many new products and systems are being developed in the area of Data Versioning and many of them rely on a Git-style workflow where they store the metadata (data about the data) and pointers to the source data, without just cloning massive datasets for every version. This particular sub-field of ML is evolving rapidly and we’ll be covering a lot more on this in the near future.

