Sam Charrington: Hey, what’s up everyone? This is Sam. A quick reminder that we’ve got a bunch of newly formed or forming study groups, including groups focused on Kaggle competitions and the fast.ai NLP and Deep Learning for Coders part one courses. It’s not too late to join us, which you can do by visiting twimlai.com/community. Also, this week I’m at re:Invent and next week I’ll be at NeurIPS. If you’re at either event, please reach out. I’d love to connect.
All right. This week on the podcast, I’m excited to share a series of shows recorded in Orlando during the Microsoft Ignite conference. Before we jump in, I’d like to thank Microsoft for their support of the show and their sponsorship of this series. Thanks to decades of breakthrough research and technology, Microsoft is making AI real for businesses with Azure AI, a set of services that span vision, speech, language processing, custom machine learning, and more. Millions of developers and data scientists around the world are using Azure AI to build innovative applications and machine learning models for their organizations, including 85% of the Fortune 100. Microsoft customers like Spotify, Lexmark, and Airbus, choose Azure AI because of its proven enterprise grade capabilities and innovations, wide range of developer tools and services and trusted approach.

Stay tuned to learn how Microsoft is enabling developers, data scientists and MLOps and DevOps professionals across all skill levels to increase productivity, operationalize models at scale and innovate faster and more responsibly with Azure machine learning.

Learn more at aka.ms/azureml.

All right, onto the show.

Sam Charrington: [00:01:52] All right everyone, I am here in sunny Orlando, Florida at Microsoft Ignite and I’ve got the pleasure of being seated across from Jordan Edwards. Jordan is a Principal Program Manager for the Azure Machine Learning Platform. Jordan, welcome to the TWIML AI Podcast.

Jordan Edwards: [00:02:08] Oh thanks.

Sam Charrington: [00:02:10] I’m really looking forward to talking with you about our subject for the day, MLOps and related topics. But before we do that, I’d love to hear a little bit about your background. It sounds like you got started off at Microsoft where a bunch of folks that are now working on ML and AI got started, in the Bing group.

Jordan Edwards: [00:02:29] Right. I started Microsoft a little over seven years ago. I started off working on the big data platforms and related machine learning platforms. Then I ended up working on engineering systems for those platforms, then we decided to take those engineering systems and apply them to machine learning as well. Hence, the internal machine learning platform was born. And as you mentioned, like a bunch of other folks who used to work on Bing, we all got moved into, “Hey, let’s take the cool stuff we built for Bing’s internal engineering platform and bring it to external customers on Azure. And so I’ve been in the Azure machine learning team a little bit over a year now.

Sam Charrington: [00:03:08] Nice, nice. And your role here on the team?

Jordan Edwards: [00:03:11] Yes. I’m the product area lead for what we call MLOps.  Which is really all about how do you bring your machine learning workflows to production.

Sam Charrington: [00:03:19] A topic that we spend a lot of time talking about here on the podcast as well as our recent TWIMLcon AI platforms event. Maybe starting, kind of directly connecting to your background, I’m curious the transition from a team that largely came out of this internal product or project, Bing, and is now trying to generalize those systems but broader knowledge and learnings to the market. What are the commonalities and differences that you encounter in trying to do that?

Jordan Edwards: [00:03:57] So there’s actually a lot of commonalities when you double click on it. But the biggest thing is Bing and Office 365,  internal Microsoft teams have been doing AI and ML for a long time.
And so they built up a lot of habits and tools and technologies, but also a lot of things that don’t necessarily map to how we see enterprises getting started, right? So most of our external customers today are coming in wanting to do Python-based development and we have some of that internally. But we also have languages that predate the popularity of Python as a data science platforms. We have engineers doing machine learning work in .NET and C++. And so those workflows are a bit different. Also, a lot of the machine learning platforms at Microsoft as you would imagine were previously Windows-based. Whereas the new customers coming in want to do things using Linux and containers and there are newer techniques that are being applied as well.
There’s similarities in there. The ways they wanna solve the problem. But just different tools that they’re using. And also just different amounts of contacts that have been built up. There’s also the matter of scale.  So when you look at teams like Bing, they’ve got a thousand data scientists that are collaborating together to train these huge models.  Most of the enterprise customers that we’re talking to, they have small teams scattered all over the place or they’re trying to staff a team. Or they have a team and they’re not sure how to make best use of their time. And also the most common problem that we’re seeing that they come to us with is, “Hey, we have all these data scientists who are doing work in Jupyter notebooks whatever, the work is happening on their local machines. We have no idea where the code is. If the codes even checked in.” And they’re doing all this work, but we can’t leverage any of it on the business side.

Sam Charrington: [00:05:49] There’s so many, so many problems in that problem statement, right?

Jordan Edwards: [00:05:53] Correct.

Sam Charrington: [00:05:54] There is,  kind of a reproducibility problem. There’s a business value path to production problem. There is kind of an accountability problem. When you unpack that, do you prioritize those?

Jordan Edwards: [00:06:11] So we try to put it in terms of like a process maturity model. It’s exactly how you framed it. There’s the reproducibility of the work. So another data scientist in the team could reproduce the same work that one person did. And then an automated system could also reproduce that work. Which means you need clean modeling around the code and data and configuration that you’re using in your model development process. Then there’s the, how do you transition this model, this thing you’ve created to production. So how do you package it? How do you certify it and how do you roll it out in a controlled fashion? And then at the end,  how do you determine the business value of your model? Is it making your business more effective?  From a cost point of view, is it worth the amount of compute hours you’re spending and the amount of man hours you’re spending training these models?
And then on the absolute end of the process maturity model is, “Okay, I’ve got this model, it’s reproducible. I’ve got it deployed out. I’m using it for a production scenario.  how do I know when I might need to retrain it?” So completing the circle. And that’s always the question that customers will come and start with is, “How do we do automated retrain?” It ‘s like, “Let’s  walk back and to begin at how do you reproduce these models in the first place?”

Sam Charrington: [00:07:26] That strikes me as a mature customer that’s asking about automated retraining, right?

Jordan Edwards: [00:07:31] Correct.

Sam Charrington: [00:07:31] Most people are trying to get the model into production in the first place, or many.

Jordan Edwards: [00:07:35] Right. They see the marketing hype, they read all the things like, “Oh, look at this company doing cool automated retraining stuff.” And realistically, it takes a long time to get to that degree of maturity where you can trust that you have the high-quality data coming into your production systems to be able to analyze and compare and figure out, I do need to retrain. And even, even in the case of like, of like being in office,  ML development teams, there’s never a fully automated retraining loop. It’s always that there’s a scorecard that gets generated and humans go and do some sort of review process prior to your new larger models going up. Especially when they deal with things like how do you monetize ads for instance.

Sam Charrington: [00:08:14] So there’s a lot there to dig into, but before we do that one of the questions that I had for you is, you’ve got MLOps in your title, what does that mean to you?

Jordan Edwards: [00:08:24] So, that means to me that it’s all about how do you take the work that data scientists are doing and make their lives easier, but also make it easier for others, other personas to come into the fold and take advantage of data science. So the three personas I like to talk about is you have your data engineer, who has got this giant lake of data. They want to figure out what value they can derive from it. You’ve got your data scientist who’s tasked with finding interesting features in that data and training models on top. And then you’ve got this new emerging persona called the ML engineer whose responsibility it is to take the work that data scientist is doing and bring it to production.
And so my job is to help the ML engineer be able to be successful and help the ML engineer be able to,  interact well with the data engineering and data science personas that are required to sort of complete that circle. And of course you also have the hub and the center of it. Your IT ops persona, who’s giving them all of the raw compute and storage resources to get started. Making sure everybody plays nicely together and actually connects things end-to-end.

Sam Charrington: [00:09:36] And so there’s kind of an obvious echo to DevOps. To what extent is that, is it inspirational? Is it kind of directly applicable or is it counter applicable meaning just don’t try to do exactly what you’re doing in DevOps?

Jordan Edwards: [00:09:54] I think it’s sort of all three of the things that you mentioned. Shocking, right? Tee me up.  so as far as how it’s inspirational, definitely the practices that have been developed in the DevOps field over the past 20 years or so are useful. However, data scientists are not software engineers. And they’re not even engineers. A lot of them are scientists.
So telling them they need to care about things related to the infrastructure and package version management and dealing with all of the intricacies of how to run a production infrastructure. That’s just not something that they’re interested in at all.
So trying to force these habits onto them. We’ve seen this, even trying to get them to write tests for their code. It takes a lot of education on the net value add they’re going to get from it before they’re willing to onboard. So definitely inspirational from a process point of view. A lot of the same tools are applicable,  but then you also need new tools that are domain-specific too: How do you do data versioning? How do you do model versioning?  How do you validate and run integration testing on models?  How do you release and do AB comparison on a model as opposed to a normal software application and know if it’s better or not? So yeah. Inspirational, applicable and you’ll get hit in the face by a data scientist if you tell them to go and implement all these things themselves.

Sam Charrington: [00:11:23] One of the things you mentioned earlier was testing. What’s the role of testing in an MLOps process and what kind of experiences have you had working with real customers to implement testing procedures that make sense for ML?

Jordan Edwards: [00:11:41] Right. So,  the place we try to start is by integrating some sort of tests on the data itself.  So ensuring that your data is of the same schema. You have high-quality data, like a colonized feature hasn’t just been dropped or the distribution of values and that feature haven’t changed dramatically. And so a lot of the stuff that we’ve built into the machine learning platform, especially on the dataset profiling side are designed to help you with that, to help you with skew testing and analyzing. Is your data too different to the point where you shouldn’t need to be training it? Or is your data too similar, or is it in that sweet spot where the same training pipeline is actually applicable to go and solve the problem. That’s on the profiling side.
And then we also have some advanced capabilities on the drift side.  So analyzing over time, how are the inputs or features into your model changing? Whether that’s a training versus scoring or day over day, week over week, month over month of the data coming into your model when it’s making predictions.  Has the shape of that data changed over time? Do you still trust the model based on the input values? And then, of course, you have the other end of it too. Which is looking at the predictions the model is making. Whether it’s from a business application.
So say I’m using the Outlook app on my phone and I’ve got the smart reply model running there. Now either they didn’t click on any of my suggestions,  they clicked on a different suggestion from the one I did, they clicked on the top suggestion that I had or they said, “I didn’t like any of these suggestions.” All those types of feedback come into telling you is the quality of data that you’ve trained your model on giving you a useful model on the prediction side?
So skew testing, validating your data’s quality correctness, consistency between training and inference of all those things.

Sam Charrington: [00:13:46] Okay. So I’m kind of pulling at threads here maybe taking a step back, you talked a little bit about a maturity model that when you look at customers, they kind of fall into these different buckets. Is there a prerequisite for starting to think about MLOps?

Jordan Edwards: [00:14:05] So I think the prerequisite is you have to have desire to apply a model to a business need. If your only goal is to write a model to say publish a paper and read like, “Hey, I have this model to solve this school problem.” Then you don’t really need any of the MLOps stuff. And if you’re just mucking around in a Jupyter notebook trying some different things by yourself, it’s also a stretch to say like, “Oh, you need these MLOps practices now.” But the second you go beyond I’m keeping all my notes in Jupyter or I’m dumping them into OneNote somewhere and just keeping track of all my experiments on my own. The second you want collaboration or reproducibility or the ability to scale up and scale out to run your jobs in the cloud, that’s where MLOps starts coming into play.

Sam Charrington: [00:14:52] I agree that collaboration is a big driver, but even an individual researcher that’s tracking hyperparameters and file names or on Post-it Note or something even worse can benefit from some elements of the tooling that we kind of refer to as MLOps. Would you agree with that?

Jordan Edwards: [00:15:11] I would. Yeah.  but just trying to sell them on using everything from the very beginning is a tougher sell. So we start by saying, just start by tracking your work.
So it’s the whole process maturity flow is you start with work tracking, then making sure everything’s in the reproducible pipeline and then making sure that others can go and take advantage of that pipeline. And then you actually have the model that you can go and use in other places.

Sam Charrington: [00:15:35] Okay. Yeah. I liked the way you pulled that together because in a lot of ways one of the questions that I’ve been kind of noodling around for a while now is where does MLOps start and end relative to platforms and tooling and the things that enable and support MLOps. And it’s very much like the conversation we were having around DevOps, like DevOps isn’t you know, containers and Kubernetes and things like that. DevOps is a set of practices and it’s very much to your point that end-to-end process. So you might need any one of a number of the tools that someone might use to enable MLOps, but that doesn’t necessarily mean that you need MLOps.

Jordan Edwards: [00:16:22] Right. And sure, I work on Azure Machine Learning. When I’m talking to customers about, well how does MLOps actually work? You’re going to have at least three different tools and technologies being used, right? ‘Cause you have three different personas. You have data engineering, data science and DevOps ML engineering, which means you’re going to have some sort of a data pipelining tool- something like Data Factory or Airflow in the open-source world.
Something to help with managing your training pipelines. Whether it’s  Azure ML as a managed service or something like Kubeflow if you’re in the open-source community. And then same thing on the release management side. Whether it’s, you’re using Azure DevOps or get up actions or you’re running your own Jenkins server. Either way, there’s gonna be at least those three different types of tools with different personas and they all need to work together and interoperate.  So that’s another key part of our pitches: make sure that you’re being flexible in how you’re producing and consuming events because MLOps is more than just model ops and you need to make sure it fits into your data and Dev side of the house.

Sam Charrington: [00:17:28] Yeah. Yeah. Awesome.  You mentioned,  Azure DevOps playing a role in here and Jenkins on the open-source side. These are tools that from the DevOps perspective, you associate with CICD, continuous integration and continuous delivery. The idea being that there’s a parallel on the model deployment side. Can you elaborate a little bit on how those tools are used?

Jordan Edwards: [00:17:51] Yeah. So the way we like to look at it from a DevOps point of view is we wanna treat a model as a packaged artifact that can be deployed and used in a variety of places.  So you have your pickle file or whatever, but you also have the execution context for… I can instantiate this model into a class and Python or I can embed it into my Spark processing pipeline or I can deploy it as an API and a container onto a Kubernetes cluster, something like that. So it’s all about how do you bring the model artifact in as another thing that can be used in your release management process flow.

Sam Charrington: [00:18:27] It does not have to be a pickle file, it could be…

Jordan Edwards: [00:18:30] It could be anything exactly. Yeah. This is my serialized graph representation,  here’s my code file,  my config that I’m feeding in. So it’s a model is just like any other type of application. It just happens to come from or have some sort of association to a machine learning framework which have come from some data.  Which is actually another important part of the MLOps story is what’s the end-to-end lineage look like, right? So ideally you should be able to go from I have this application that’s using this model,  here’s the code and config that was used to train it. And here is the data set that this model came from. Especially when we’re talking to customers in more of the highly regulated industries.
So, healthcare, financial services.  Say you have a model deployed that’s determining if it’s gonna approve or reject somebody for a loan. You need to be very careful that you’ve maintained your full audit trail of exactly where that model came from in case somebody decides to come in and ask further of this. This also becomes more complicated, the more of a black box that your model is.
But in general, the goal of having all of these different technologies work together and interoperate is so that you can track sort of your correlation ID or correlation vector across your entire data and software and modeling landscape.

Sam Charrington: [00:19:56] We talk about that end-to-end lineage. Is that a feature? You use Tool-X,  use Azure ML and click a button and you have that? Or is it more than that and a set of disciplines that you have to follow as you’re developing the model?

Jordan Edwards: [00:20:15] So yeah. The latter leads to an ailment of the former. So assuming that you use the-

Sam Charrington: [00:20:23] I think you’re all of the above guy.

Jordan Edwards: [00:20:26] Yeah, yeah, yeah. You’re teeing it up right. So when it comes to using the tools the right way, sure you could just have a random CSV file that you’re running locally to train a model on. But if you wanna assert, you have proper lineage of your end-to-end ML workflow. Like that CSV file should be uploaded into blob storage and blocked down and accessed from there to guarantee that you can come back a year later and reproduce where this model came from.
Same thing on the code and packaging and the base container images that you’re using when you’re training the model. All that collateral needs to be kept around. And what does that allow you to do? So, we have the inside of the machine learning service,  internal,  meta store that keeps track of all the different entities and the edges that connect them together. And,  right now we have sort of a one hop exposure of that. But one of the things we’re working on is a more comprehensive way to peruse the graph. So it’s like, “Hey, across my enterprise, show me every single model that’s been trained using this dataset.” Not scoped toWa single project that my team is doing, but across the entire canvas.
Show me everybody using this data set.  What types of features are they extracting from it?  Is somebody doing work that’s similar to mine? Can I just fork their training pipeline and build on top of it? And going back to how has this work we’ve done for internal teams inspired the work we’re doing on Azure?  That’s probably the most powerful part of our platform for internal Microsoft teams is the discovery, the collaboration, the sharing. That’s what allows you to do ML at high scale, at high velocity.
And so we want to make sure as much as we can that the tools and technologies that we have on Azure provide that same capability,  with all of the enterprise-ready features that you would come to expect from Microsoft and Azure.

Sam Charrington: [00:22:27] Yeah. So in that scenario you outlined the starting place as a dataset that’s uploaded to blob storage.  Even with that starting place you’ve kind of disconnected your ability to do lineage from the source dataset which may be in a data warehouse or something like that. Is there also the ability to point back to those original sources?

Jordan Edwards: [00:22:55] Oh yeah. sometimes you’ll have a CSV there, but you can also connect to a SQL database or to your raw data lake and have a tracking of, okay, this is the raw data. Here’s say the data factory job that did all these transformations. Here’s my curated dataset. Here’s all the derivations of that data set. Here’s the one I ended up using for training this model. I took this model and transfer learned on top of it to produce this new model. And then I deployed this model as this API and you can trace things all the way back to there. And then going the other way, when this model is now running,  I can be collecting the inputs coming into my model and the predictions my model is making.
I log those into Azure monitor and then my data engineer can set up a simple job to take that data coming in and put it back into the lake or put it back into a curated data set that my data scientist can now go and experiment on and say, “Well how’s the data coming into my model that’s deployed compared to when I trained it.” That’s completing the circle back to the beginning.

Sam Charrington: [00:23:57] Nice. Nice.  Which conceivably,  you could, as opposed to talking about a data set, this data set has produced what models you could point to a particular row in a data warehouse or something like that or a value and say what’s been impacted by this particular data point.
Jordan Edwards: [00:24:17] Exactly. And that’s the value that we’re trying to get out of the new generation of Azure data lake store and some of the work we’re doing on the Azure data catalog side is to give you exposure into what’s all the cool stuff that’s being done or not being done with this data.
It goes back to letting your,  decision-makers know, am I occurring business value from these ETL pipelines that I’m spinning all these compute dollars to go and cook these curated data sets in.
And that’s a large part of what our larger ML platform team did before as well, we helped with creating curated data sets for Bing and Office to go and build models on top of. So we had the data engineering pipelines and the machine learning pipelines and the release management pipelines all under the same umbrella. Which helped to inform the way we’re designing the system now to be designed to meet enterprises where they are and help them scale up and out as they go.

Sam Charrington: [00:25:16] I’m curious, what are some of the key things that you’re learning from customers kind of on the ground who are working to implement this type of stuff? How would you characterize where folks are, if you can generalize and what are the key stumbling blocks?

Jordan Edwards: [00:25:35] So if we were to think about it in terms of four phases where phase one is kicking the tires, phase two is model is reproducible, phase three is models deployed and being used in phase four is I have all the magical automated retraining wizardry. They’re mostly between phase one and phase two right now.
Very few of them have actually gotten a model deployed into the wild. If they have it deployed, it’s only deployed as  a Dev test API. They don’t trust it yet. So that’s one learning is customers were a lot earlier in the journey than we’ve been expecting coming from doing this for internal Microsoft teams.
Another one is that for the customers we’re talking to, their internal organizations are not always structured to let them innovate most effectively.
So they’ll have part of their org, their data team and their IT department and their research teams are totally disconnected, disjointed, don’t communicate to each other, don’t understand each other. And so IT just sees what the researchers are doing and says, “There’s no way you’re doing any of this in production.” But data engineers are unsure what data the data scientists are using. A data scientist might be off running SQL queries in the side, but they have no idea from which tables, the tables will disappear under the data scientist.
So  instead of  doing a pure, “Okay, here’s how to use the platform.” It’s more, “Hey, let’s get all the right people in the room together from IT and research and your data platform and your software development platforms and start a conversation and build up the domain expertise and the relationships on the people side before you get started by the process or the platform. That’s been, yeah. One big learning is to step back and focus on getting the right people involved first and then they can figure out the process that’s going to work well for their business. And then they can adopt the platform tools that we’ve been building to help them be more efficient at doing end-to-end ML.

Sam Charrington: [00:27:38] Are you finding that there’s a pattern in organization that allows organizations to move more quickly? Like centralized versus decentralized or quote on quote ‘center of excellence’ or embedded into business units? Are there any of those that works best?

Jordan Edwards: [00:27:58] I think what we’ve seen work best is to have one business unit sort of act as the incubator to vet the end-to-end flow and actually get a model working in production. But then have the overall center of excellence, a centralized team, observe what they’re doing and take notes and let them flesh out what the canonical reference, MLOps architecture pipeline should look like.
So out of all the patterns, we’ve seen a lot of patterns being applied, that one seems to be the best so far though is to let a small team give them some flexibility to go and build a model, take it to production with some light guardrails and they can build out the reference architecture. The Git Repository and CICD pipeline templates that the rest of the teams and the company can use.

Sam Charrington: [00:28:51] And is the salient point there that the end business unit that has the problem owns the deployment of the model as opposed to the centralized but somewhat disconnected,  data science or AI COE?

Jordan Edwards: [00:29:06] Yes. So your DevOps team for your business unit needs to know and understand the fact that a model is gonna be entering their ecosystem and needs to be able to manage it with the same tools they manage their other application releases with. Hence the integration with Azure DevOps to make sure that all your pipelines are tracked and managed in one place. And there’s not this one row release pipeline that’s coming in and causing issues and havoc with the rest of your production system.

Sam Charrington: [00:29:34] And generally, when you look at these production pipelines did the pipelines and the tooling resonate with the DevOps teams or are they like this strange beast that takes a long time for them to wrap their heads around?

Jordan Edwards: [00:29:48] So they freak out until they see the Azure DevOps integration. Then they’re like, “Oh, okay, I understand that.” Hence where I’m like, you need to have the tools that your audience can understand or you show them a Jupyter notebook, they’ll jump out of their seats and run away scared. Whereas you show them, “Oh, here’s a managed multi-phase released pipeline with clearly defined declarative ammo for the different steps.” That resonates well with them. Whereas data scientists, you show them a big complex approval flow and they’re going to be like, “I’m never using any of this.” You showed them a Jupyter notebook, they’re happy or an ID with low friction Python. And then your data engineers, again you showed them a confusing notebook process flow. They’re not going to like that as much, but you show them a clean like ETL where they can drag and drop and run their SQL queries and understand, are their pipelines running in a stable fashion? That resonates with them. So yeah different personas, different tools they need to work together and figure out what process is going to work for their business needs.

Sam Charrington: [00:30:48] As I’ve kind of looked at primarily this machine learning engineer role that has been emerging over the past few years and now we’re talking about the DevOps engineer as a separate thing, but the line is kind of a gray moving and blurred line, right?

Jordan Edwards: [00:31:02] Yeah. What we’ve seen in terms of… We’ve had customers ask us, “Well, how do we hire these ML engineers?” And it’s like basically, you need a person who understands DevOps but also can talk to your data scientists or can [laughs], can figure out the work they’re doing, help them get their work into a reproducible pipeline on the training side and help with deploying the model and integrating it into the rest of your, your application life cycle management tools. So yeah. Your engineer needs to be a DevOps person with some understanding of ML.

Sam Charrington: [00:31:33] And is a DevOps person necessarily a software engineer that is coding a model?

Jordan Edwards: [00:31:40] Not necessarily. They just need to be really good at operational excellence. So do they understand how to how to write things declaratively? How to set up process control flows so that things work nicely end-to-end? Like you don’t need to understand the ML the data scientist is doing. You need to understand the process they’re going through to produce that model. So they have a bunch of code and a Jupyter notebook, help them factor it into modules that you can stitch together. But you don’t need to understand the machine learning framework that they’re using specifically in that context.

Sam Charrington: [00:32:15] You’ve mentioned Jupyter notebooks a few times one of the things that I see that folks are trying to figure out is like, should we do ML in notebooks or should we do ML in IDEs, Microsoft has a huge investment in IDEs. But you’ve also been in visual studio code, making it more kind of interactive, integrated kind of real time to incorporate some of the notebook-esque style of interaction.

Jordan Edwards: [00:32:43] Right. So we want it to be fluid to go from one or the other. We’ve seen the value in the interactive canvases for doing rapid-fire experimentation. We’ve also talked to large companies like Netflix to learn how they use notebooks and automation at scale.

Sam Charrington: [00:33:00] Your Papermill project for example?

Jordan Edwards: [00:33:02] Exactly. So we’ve actually integrated Papermill into our platform as well. So if you’re designing your training pipeline, you can stitch together a mix of scripts and notebooks and data processing steps together, and we try to be as fluid as we can. And we’re working with the developer division as well to figure out how to more cleanly integrate notebooks into our ID experiences. And you saw some of that in the BS code side and there’s more stuff coming to help with that.

Sam Charrington: [00:33:31] We’ve talked a little bit about this automated retraining, aspect of managing model life cycles. Are there other aspects of managing model life cycles that you find important for folks to think about?

Jordan Edwards: [00:33:44] Yeah knowing when to retrain the model is one thing. Knowing when to deprecate the model is another thing too. So say that the data the model is trained with is stale, or can’t be used anymore, or got removed for GDPR reasons. This is why having the whole lineage graph is so important to be able to figure out exactly what data was used to train the model.  Other things around model life cycle management: know who is using it, know where the model is running. Know if the model is adding business value.  Know if the data coming into the model has changed a lot since you trained it. Know if the model is dealing with some type of seasonal data and needs to be retrained on a seasonal basis there. And then also,  know the resource requirements for your model. So another big thing we see a trip a lot of our customers up is they train the model on these big beefy VMs with massive GPUs and then you go to deploy and it’s like, “Hey, my model’s crashing. What do I do?”
And so we’ve tried to build tooling in to help with that as well. So profiling your model, running sample queries into it.  different sizes of sample queries too. Not always the same thing and making sure you know, does your model have enough CPU and memory and the right size GPU to perform effectively. We’re also doing some work on the Onyx Framework to help with taking those models and quantizing them or optimizing them for a specific business use case on the hardware side. Which is really, slowly coming in, especially we have customers in the manufacturing sector who want to run models quickly on the edge, on small hardware. So how do you manage that transition from this model I trained on this beefy machine to this model running on this tiny device?

Sam Charrington: [00:35:33] Are you finding that most customers are deploying models or even thinking about an individual: I’ve got this model that I’ve created and I’m going to think about the way I deploy this model versus I’ve got a model, I built it to a standard, it’s just like any other model. And then I’m going to just kind of throw it into my model deployment thing. Are they there yet?

Jordan Edwards: [00:35:58] Some of them are there. The ones that have been doing this for a while longer and develop like their template for their model deployment flow. We try to provide as much tooling as we can in the platform and in the registry for you to track all the relevant things about it. But really just getting the model deployed into your existing app ecosystem, making sure that you have the ability to do,  controlled rollout and AB testing. ‘Cause you don’t want to just always pave over the previous model. So the most advanced customers are just getting to that point now where they’re ready to start doing AB testing and looking for our help to go and do that.
Yeah. So along the lines of testing, we’ve talked about this a little bit.  There’s both the online testing of your models freshness, but then also all kinds of,  deployment scenarios that have been developed in the context of DevOps. Like Canary then red, green, blue kind of stuff. All the colors, right? So do you see all of that stuff out in the wild?
Yes.  The main difference we’ve seen with models compared to normal software being rolled out is oftentimes they’ll develop a model and test it offline and batch for awhile before using it.
So they wouldn’t need to necessarily deploy it to receive real traffic right away. They’ll get the new model, they’ll wait a week, run the model and batch against the past week’s worth of data, and then compare how different it is to it. So it’s just the fact that you can test the model offline as opposed to having to do everything in an online fashion. That’s probably the biggest Delta. But otherwise we see all the same patterns as with normal software.

Sam Charrington: [00:37:46] Because you’re testing two things, right? You’re testing the model’s statistical ability to predict something, but then it’s also software. And you don’t necessarily want to put a broken piece of software out there.

Jordan Edwards: [00:37:57] Right. Especially because it’s a software with uncertain behavior or more, more uncertain behavior than any normal software application you’d throw out there.

Sam Charrington: [00:38:06] What can we look forward to in this space from your perspective?

Jordan Edwards: [00:38:10] So as far as things to look forward to,  there’s lots of investments coming in, improving our story around enterprise readiness, making it easier for customers to do security, science and ML workloads. Work to help improve collaboration and sharing across the enterprise– how do I figure out,  which other teams have been doing modeling work similar to mine? How do I take advantage of that? So, accelerating collaboration, velocity,  more work on the enterprise readiness front and then a tighter knit integration with the rest of the big data platform stuff. So integration with data lake, data catalog, data factory, devOps get up and it’s all about helping customers get to production ML faster.

Sam Charrington: [00:38:55] Well, Jordan, thanks so much for chatting with me.

Jordan Edwards: [00:38:57] Thanks for having me. Yeah, appreciate it.