Once a model is put into production, it is important to monitor its ongoing performance. Machine learning models are perishable, meaning that the statistical distribution of the data a model sees in production will inevitably start to drift away from that seen during training. This will cause model performance to degrade, which can result in negative business outcomes such as lost sales or undetected fraudulent activity if not detected and addressed.
Production models should be instrumented so that the inputs to and results of each inference are logged, allowing usage to be reviewed and performance to be monitored on an ongoing basis. Owners of models experiencing degraded performance can use this information to take corrective action, such as retraining or re-tuning the model.
Business or regulatory requirements may impose additional requirements dictating the form or function of model monitoring for audit or compliance purposes, including in some cases the ability to explain model decisions. (see also Bias Detection and Mitigation)
Model monitoring is generally a concern shared by all of an organization’s production models, and thus ideally supported by a common framework or platform. This has the advantage of making monitoring “free” for data scientists, that is, something they get the benefit of but don’t need to worry about building. While some models or applications may have unique monitoring or reporting requirements requiring the involvement of data scientists or ML engineering staff, they should ideally be able to take advantage of low-level “plumbing” that is already in place.