Model quality and performance is typically thought of as the model developer’s problem rather than a concern of ML production engineers. But changes in model quality represent the only truly end-to-end test of ML infrastructure, reliably identifying subtle problems in feature storage, metadata, model configuration, training and serving. ML production engineers generally avoid directly measuring or responding to changes in model quality as an operational or reliability concern, but that needs to change. In this talk, I elaborate on and support this perspective based on our experiences at Google, explore some of the technical and cultural transformations we’ve had to make in order to close the model quality loop, and share some of the results our teams have seen through this work.