Model Debugging

Model Debugging

Debugging traditional software is a well-understood process with well-known tools and workflows. The industry collectively has decades of experience debugging code. Machine learning is different. The computers are effectively programming themselves with data, and producing probabilistic logic which is not testable in the same way as code. Not having good debugging tools means that the time and cost of training ML models are likely too high and it’s probably taking too long to either get to the level of accuracy needed or to identify issues leading to inaccurate predictions.

If we look at an ML system, it consists of datasets, model architecture, model weights, algorithm parameters, and more. Models can perform poorly for a variety of reasons including but not limited to: features that lack predictive power; hyperparameter values that are suboptimal; data that contains errors or anomalies; buggy feature engineering code, and many other issues. A further complication is the amount of time that it takes to run an experiment by training a model and verifying the results. Longer iterative cycles and larger error domains make debugging ML models a completely different challenge.

ML Debugging is effectively a new discipline that attempts to test ML models, probe their responses and decision boundaries, and vet for accuracy, fairness, security, and other risk factors.

What is now starting to emerge are a class of tools purpose-built to address the needs of the ML practitioners that let them do the following:

  • Assess both the model optimization as well as the underlying training performance infrastructure and recommend ways to better utilize the underlying hardware.
  • Capture model and optimizer specific data during training
  • React to changes in the captured data and let the user specify rules for when certain conditions are met.
  • Analyze data in real-time during training to enable them to monitor the training runs.
Weights & Biases
With a few lines of code, save everything you need to debug, compare and reproduce your models
Determined AI
Build models, not infrastructure
Build better models faster
Industry-leading AI OS for machine learning
Amazon SageMaker
Machine learning for every developer and data scientist
Your entire MLOps stack in one open-source tool
Cloudera Machine Learning
Industrialize AI with Cloudera Machine Learning
Dataiku Data Science Studio
Collaborative data science