Scaling Deep Learning on Kubernetes at OpenAI with Christopher Berner

800 800 This Week in Machine Learning & AI

In this episode of our AI Platforms series we’re joined by OpenAI’s Head of Infrastructure, Christopher Berner.

Chris has played a key role in overhauling OpenAI’s deep learning infrastructure of the course of his two years with the company. In our conversation, we discuss the evolution of OpenAI’s deep learning platform, the core principles which have guided that evolution, and its current architecture. We dig deep into their use of Kubernetes and discuss various ecosystem players and projects that support running deep learning at scale on the open source project.

As many of you know, part of my work involves understanding the way large companies are adopting machine learning, deep learning and AI. While it’s still fairly early in the game, we’re at a really interesting time for many companies. With the first wave of ML projects at early adopter enterprises starting to mature, many of them are asking themselves how can they scale up their ML efforts to support more projects and teams.

Part of the answer to successfully scaling ML is supporting data scientists and machine learning engineers with modern processes, tooling and platforms. Now, if you’ve been following me or the podcast for a while, you know that this is one of the topics I really like to geek out on.

Well, I’m excited to announce that we’ll be exploring this topic in depth here on the podcast over the next several weeks. You’ll hear from folks building and supporting ML platforms at a host of different companies. We’ll be digging deep into the technologies they’re deploying to accelerate data science and ML development in their companies, the challenges they’re facing, what they’re excited about, and more.

In addition, as part of this effort, I’m publishing a series of eBooks on this topic. The first of them takes a bottoms-up look at AI platforms and is focused on the open source Kubernetes platform which is used to deliver scalable ML and infrastructure at OpenAI, Booking.com, Matroid and many more companies. It’ll be available soon on the TWiML web site, and will be followed shortly thereafter by the second book in the series which looks at scaling data science and ML engineering from the top down, exploring the internal platforms companies Facebook, Uber, and Google have built, the process disciplines that they embody, and what enterprises can learn from them.

If this is a topic you’re interested in, I’d encourage you to visit twimlai.com/aiplatforms and sign up to be notified as soon as these books are published.

About Christopher

Mentioned in the Interview

“More On That Later” by Lee Rosevere licensed under CC By 4.0

Leave a Reply

Your email address will not be published.