Have a great idea for a new machine learning-powered product or feature, but not quite sure all the steps to take to make your vision a reality? You are not alone! Many, even practicing data scientists and machine learning engineers or developers, struggle with putting together the pieces to go from idea to product. Fortunately, this week Sam interviews Emmanuel Ameisen, a machine learning engineer at Stripe, who just published a book on the topic — Building Machine Learning Powered Applications: Going from Idea to Product.
Subscribe: iTunes / Google Play / Spotify / RSS
Emmanuel began his career as a data scientist and went on to mentor over a hundred Ph.D. fellows looking to transition into machine learning as an AI program lead at Insight Data Science. His new book is the culmination of what he learned, and provides a guide for aspiring and practicing engineers and data scientists on how to approach ML projects systematically.
Structuring End-to-End Machine Learning Projects
In this interview, as in the book, Emmanuel shares his best practices for structuring and building projects. Emmanuel approaches new ML projects in four main stages:
- Formulating the problem and creating a plan: Here we want to think about the best possible approach to solving our specific problem. The goal is to simplify, simplify, simplify, and have a clear understanding of what your success metrics are before you start to build anything.
- Building a working pipeline and acquiring an initial dataset: Emmanuel recommends building an end-to-end data processing pipeline, albeit a simple one, right from the start, and walks us through how to test and evolve it. Like your pipeline, your dataset is also something you’ll want to iterate on. Your data should inform your features and models, and not the other way around.
- Iterate on your models: Model development is inherently iterative, and Emmanuel shares his approach to developing and evaluating models. The latter depends on your ability to successfully chose an evaluation metric that is most appropriate for your problem, and tools like confusion matrices, ROC curves, a calibration curves, and various approaches to visualization can all come into play when trying to debug your models. Evaluating feature importance can also help here, as it allows you to check your assumptions about the problem.
- Deployment and monitoring: A number of non-technical and technical considerations come into play when unleashing your models to the real world. First off, we need to consider the ethical implications of our models as well as concerns like data ownership and bias. From a technical perspective, we need to choose a deployment option that makes sense for the way the model will be accessed by its users. We also want to build safeguards and sanity checks to protect us from model failures, and monitor the model’s predictions over time.
Emmanuel emphasizes the broader end-to-end ML process over just the raining of the model because the growing maturity of ML and AI tools has simplified that part of the process. “It used to be that training up a model was pretty hard, and you needed a team of people that understood the internals deeply, but now because the tooling has evolved so much…and the tuning is so good, and the courses are so good…, it becomes relatively simpler than the rest [of the process].”
To support the concepts taught in the book, readers are guided through the process of building a predictive text application that combines heuristics, rules, models and engineering work. Emmanuel had an extensive thought process behind choosing predictive texts as the best learning example: “Writing and assisting people to write better is a crucial example where you can check for grammar (that’s just rules), or you can check for vocabulary or a variety of things, and you can also help them improve their style… So it was a nice blend that reflects what happens in the real world.”
To further explore Emmanuel’s take on building end-to-end ML pipelines to realize your ideas, be sure to check out the interview. We only scratched the surface in this article. And of course, we highly recommend the book, which is available here!
(You support the show by using our affiliate links to purchase your copy of the book.)
Connect with Emmanuel!
- Grab the book! Building Machine Learning Powered Applications: Going from Idea to Product
- Bridging the Gap Between Academic and Industry Careers with Ross Fadely
- Live from TWIMLcon! Overcoming the Barriers to Deep Learning in Production with Andrew Ng
- Human-in-the-Loop AI for Emergency Response and More with Robert Munro
- Scaling Model Training with Kubernetes at Stripe with Kelley Rivoire
- Join the TWIML Community!
- Check out our TWIML Presents: series page!
- Register for the TWIML Newsletter
- Check out the official TWIMLcon:AI Platform video packages here!
- Download our latest eBook, The Definitive Guide to AI Platforms!
“More On That Later” by Lee Rosevere licensed under CC By 4.0