ML engineers and data scientists spend most of their time testing and validating their models’ performance. But as machine learning products become more integral to our daily lives, the importance of building a systematic and rigorous testing process will only increase. Current ML evaluation techniques are falling short in their attempts to describe the full picture of model performance. Evaluating ML models by only using global metrics (like accuracy or F1 score) produces a low-resolution picture of a model’s performance and fails to describe the model performance across types of cases, attributes, scenarios. It is rapidly becoming vital for ML teams to have a full understanding of when and how their models fail and to track these cases across different model versions to be able to identify regression. We’ve seen great results from teams implementing unit and functional testing techniques in their model testing. In this presentation, we cover why systematic unit testing is important and how to effectively test ML system behavior.