Network

Photo: Todd Underwood

Todd Underwood

Senior Director, ML SRE
Google
Connect with Todd

Todd Underwood is a Director at Google and leads Machine Learning for Site Reliability Engineering and author of the forthcoming book Reliable Machine Learning (O'Reilly, Sept 2022). He is also Site Lead for Google's Pittsburgh office and co-author. ML SRE teams build and scale internal and external ML services and are critical to almost every Product Area at Google.

Conference Sessions

Perspective
2022  TWIMLcon
Model quality and performance is typically thought of as a problem within scope for model developers more than ML production engineers. But changes in model quality represent the only truly end-to-end test of ML infrastructure
Case Study
TWIMLcon  2021
Delivering a bad model into production/serving is deceptively easy to do. Using a hand analysis of approx 100 incidents, we identify common causes and manifestations of failures, provide some idea for how to measure the potential damage, and propose a set of techniques for detecting problems before they cause damage.