Turning Ideas into ML Powered Products with Emmanuel Ameisen
EPISODE 349
|
FEBRUARY
17,
2020
Watch
Follow
Share
About this Episode
Have a great idea for a new machine learning-powered product or feature, but not quite sure all the steps to take to make your vision a reality? You are not alone! Many, even practicing data scientists and machine learning engineers or developers, struggle with putting together the pieces to go from idea to product. Fortunately, this week Sam interviews Emmanuel Ameisen, a machine learning engineer at Stripe, who just published a book on the topic — Building Machine Learning Powered Applications: Going from Idea to Product.
Emmanuel began his career as a data scientist and went on to mentor over a hundred Ph.D. fellows looking to transition into machine learning as an AI program lead at Insight Data Science. His new book is the culmination of what he learned, and provides a guide for aspiring and practicing engineers and data scientists on how to approach ML projects systematically.
Structuring End-to-End Machine Learning Projects
In this interview, as in the book, Emmanuel shares his best practices for structuring and building projects. Emmanuel approaches new ML projects in four main stages:
- Formulating the problem and creating a plan: Here we want to think about the best possible approach to solving our specific problem. The goal is to simplify, simplify, simplify, and have a clear understanding of what your success metrics are before you start to build anything.
- Building a working pipeline and acquiring an initial dataset : Emmanuel recommends building an end-to-end data processing pipeline, albeit a simple one, right from the start, and walks us through how to test and evolve it. Like your pipeline, your dataset is also something you'll want to iterate on. Your data should inform your features and models, and not the other way around.
- Iterate on your models : Model development is inherently iterative, and Emmanuel shares his approach to developing and evaluating models. The latter depends on your ability to successfully chose an evaluation metric that is most appropriate for your problem, and tools like confusion matrices, ROC curves, a calibration curves, and various approaches to visualization can all come into play when trying to debug your models. Evaluating feature importance can also help here, as it allows you to check your assumptions about the problem.
- Deployment and monitoring : A number of non-technical and technical considerations come into play when unleashing your models to the real world. First off, we need to consider the ethical implications of our models as well as concerns like data ownership and bias. From a technical perspective, we need to choose a deployment option that makes sense for the way the model will be accessed by its users. We also want to build safeguards and sanity checks to protect us from model failures, and monitor the model's predictions over time.
About the Guest
Emmanuel Ameisen
Anthropic
Resources
- Grab the book! Building Machine Learning Powered Applications: Going from Idea to Product
- Bridging the Gap Between Academic and Industry Careers with Ross Fadely
- Live from TWIMLcon! Overcoming the Barriers to Deep Learning in Production with Andrew Ng
- Human-in-the-Loop AI for Emergency Response and More with Robert Munro
- Scaling Model Training with Kubernetes at Stripe with Kelley Rivoire