We’re proud to announce the new TWIML Solutions Guide, a directory of machine learning tools and platform technologies for data scientists, ML engineers and other AI practitioners and leaders. The Guide aims to help them explore and compare open source and commercial offerings for building, delivering, and improving their ML and AI projects.
The tools in this space are important because, as we mentioned in The Definitive Guide to Machine Learning Platforms:
“It’s our belief that effective platforms are key to delivering ML and AI at scale. These platforms support data science and ML engineering teams by allowing them to innovate more quickly and consistently. “
Our goal is to bring some clarity to the increasingly crowded marketplace for MLOps platforms and tools.
Visualizing the ML Platforms and Infrastructure Space
One of our first efforts at visualizing the ML platforms and infrastructure space in one place appeared in the first edition of Kubernetes for MLOps. At the time it included nearly 60 products, most of which weren’t even ML specific.
When we revised the Kubernetes ebook a year ago, it became clear how much of a challenge it would be to capture a rapidly growing ecosystem in a relatively static diagram. At that point we set out to produce the Solutions Guide and never looked back.
The “ML Landscape Landscape”
As we surveyed the “ML landscape landscape,” we saw three primary approaches being used to make sense of the Cambrian explosion of innovation in the space: AI landscape maps, MLOps landscape maps, and broad directory sites.
First we saw the excellent AI market maps produced by folks like CB Insights and VentureBeat. These annually updated maps help us all understand how the structure of the market is evolving and we think they’re a good and useful high-level view of the market.
Next, once MLOps really came into its own, we began to see similar market maps and lists that really narrowed in on the space. Examples of this approach include Chip Huyen’s excellent What I learned from looking at 200 machine learning tools—the most recent version of which includes nearly 300 tools—and more recently Daniel Jeffries’ Rise of the Canonical Stack in Machine Learning and Maximizing ML Infrastructure Tools for Production Workloads by the AI Infrastructure Alliance.
Finally, we saw directory sites that cover everything from Analytics to Virtualization across thousands of unrelated categories. These sites are “an inch deep and a mile wide” and have no domain depth or credibility within the ML/AI community.
A Gap in ML, AI and MLOps Solutions Directories
What we saw was a gap for both builders and providers. Builders need a trusted resource they can turn to to understand the market and its various offerings, where they can cut through the hype and noise and compare products side by side in a meaningful way. Providers need a place to go to meet serious, educated, interested buyers. All of this built on a foundation of shared learning and conversation about the space.
We set out to fill that gap by building a directory that combined the best aspects of all three existing approaches and then added the “TWIML touch.”
In particular, we believe that by providing rich, segment-specific, feature-level detail about each of the included offerings we’ve created a tool that is more useful for builders than the alternatives.
Key Features of MLOps Solutions
For the MLOps-focused solutions in the Guide, the features we’re tracking today include:
Defining, researching, and applying these features has been the hardest part of producing the Solutions Guide. Some of the many issues we encountered include:
- Lack of well-documented and agreed upon feature and category definitions
- Blurry, overlapping, and hierarchical feature and category definitions
- Inherent subjectivity, e.g. “how much depth constitutes having a feature?”
- Confusing vendor documentation and marketing materials and overstatement of product capabilities.
Nonetheless, we’re very excited about the Solutions Guide’s ability to provide both concrete value today and a foundation to build on in the future.
Top ML Platforms and Tools Compared
The infographics below do a better job of illustrating the value of the Solutions Guide’s feature-level detail.
Comparing the Top 20 End-To-End ML Platforms
The first graphic presents 20 of the top End-to-End MLOps Platforms in the Solutions Guide. Each of the shaded boxes represents a feature in the guide that’s either present or not in each solution. Taken as a whole the chart presents a visual map of feature completeness for each solution and the space as a whole, subject to the caveats noted above.
Comparing the Top 20 Specialist ML Tools
The second graphic focuses on 20 of the top Specialist ML Tools in the Solutions Guide, similarly illustrating their features.
Note: The included solutions are listed in the appendix below along with links to their Solutions Guide profiles.
We will expand the Guide beyond MLOps into adjacent areas, adding additional features as we go. For example, we’ll be expanding on the Data Labeling and Annotation category soon, adding support for capabilities specific to those tools such as their content types supported (text, image, video, audio, sensor, AR/VR); annotation capabilities (detection, recognition, classification, translation, trajectory prediction); labeling methods (machine-only, machine-human hybrid, or human work-force) and even certifications.
End-to-End vs Specialist MLOps Tools
Regarding the distinction between end-to-end and specialist tools in these graphics, we think it’s an important distinction in the space, though one that, as predicted in The Definite Guide to Machine Learning Platforms, continues to blur.
That ebook contains a lengthy discussion on this topic, referred to there as “wide” vs. “deep” and “generalist” vs. “specialist.”
“One of the most interesting and important distinctions among the various tools available to help you build out your organization’s machine learning platform is whether the tool aims to be End-to-End or deep and specialized.
- End-to-End. This refers to generalist tools that seek to provide end-to-end support for various aspects of the ML workflow. These offerings aim to give users a broad platform-in-a-box experience.
- Specialist. This refers to specialist tools that seek to solve one problem deeply. These tools typically have robust APIs and are designed to easily fit into an organization’s existing ML workflow.
It is important to understand that End-to-End vs. Specialist should not be equated to good vs. bad or vice versa. Rather, what’s important is to realize that different technologies have different aims and that each organization will need to identify the best fit for its needs and choose accordingly. Second, this is not necessarily an either/or situation. For many organizations building out their overall ML production system, they may opt for BOTH/AND, going with an End-to-End platform in combination with a few Specialist tools to provide more depth where needed. Your organization’s goals will dictate your requirements which will ultimately dictate your architecture and therefore components.”
What you see today at twimlai.com/solutions just the beginning. We will continue to refine and evolve the Solutions Guide over time, adding additional detail, TWIML’s unique perspective, and new features to make the Guide even more useful.
Have feedback for us? We definitely want to hear from you!
Read this far? Clearly you’re into this! Maybe you should join us?!? We’re hiring in research and marketing roles in support of this project. I’d love to hear from you!
Check out these links to the Solutions Guide profiles for the offerings mentioned in the “Top 20” graphics in this post.
End-to-End ML Platforms
Amazon SageMaker, Azure Machine Learning, IBM Watson Studio, Dataiku Data Science Studio, Google Vertex AI, Databricks, KNIME, Paperspace Gradient, RapidMiner Studio, ClearML, HPE Ezmeral MLOps, Splice Machine ML Manager, Verta, Cloudera Machine Learning, H2O, Iguazio, Domino Data Lab, Algorithmia, Cnvrg.io, DataRobot
Specialist ML Tools
Tecton, Snorkel, Hopsworks, DVC, Alectio, Pachyderm, IBM Cloudpak for Data, Spell, Valohai, Weights & Biases, Determined AI, CometML, Anaconda, Modzy, Datatron, Seldon Core, Arthur AI, Fiddler Labs, BentoML, OctoML