Model Deployment and Operations
Explore by Category
Filter by Feature

Model Deployment and Operations

Once a model has been developed, it must be deployed in order to be used. While deployed models can take many forms, typically the model is embedded directly into application code or put behind an API of some sort. Models in production are often more computationally expensive than training. Unlike the demands of training, the computational burden of inference scales with the number of inferences made and continues for as long as the model is in production. Meeting the requirements of inference at scale is a classic system engineering problem. Addressing issues like scalability, availability, latency, and cost are typically primary concerns.

Furthermore, if mobile or edge deployment is a goal, then you may need to compress (“quantize”) and translate (“adapt”) the model to run on smaller devices with different processors, lower power budgets, and less memory.

In addition to the fundamentals of running a high-uptime, low-latency, real-time prediction service, it is important to monitor and alert on both the accuracy and performance of the model in production. If the model becomes less accurate or fit for purpose (“drift”), then it may need to be retrained or even decommissioned. If it becomes resource-constrained (too much traffic for example), then it may need to be modified or the infrastructure supporting the model-serving may need to be scaled out. Both elements are critical for successful ongoing monitoring.

In short, deployment and operational tooling generally include some or all of the following elements:

  • API and end-point creation and management;
  • Packaging, portability, and compression (quantization);
  • Model serving
  • EdgeML support
  • Infrastructure monitoring and alerting
  • Model accuracy and drift monitoring
Data Types Handled
Model serving made easy
Monitor, explain, and optimize ML models
Operationalize Responsible AI with Credo AI
Everything you need to go from pixels to value
Make your data science, engineering, devops, and governance teams work more efficiently
Know the why and the how behind your AI solutions
Innovate faster with enterprise-ready generative AI
The Enterprise Feature Store that provides seamless collaboration and maximum real-time performance.
The AI community building the future
Your trusted platform for enterprise AI
The machine learning acceleration platform
Maintain ML Integrity by eliminating AI failures
Run complex AI training and inference workloads at maximum speed and cost efficiency using Run:AI’s Compute Orchestration Platform on-prem or in the cloud.
Open-source platform for rapidly deploying machine learning models on Kubernetes
Spell is DLOps
The Enterprise Feature Store for Machine Learning. Build a library of great features. Serve them in production. Do it at scale.
Trustworthy AI for better business decisions
ML. The Pioneer Way.
AI and machine learning model management and operations for enterprise data science teams
Trustworthy, Optimized, Enterprise-Wide Deployment and Management of Machine Learning Models at Scale.