Scaling Model Training with Kubernetes at Stripe with Kelley Rivoire

EPISODE 272
|
JUNE 6, 2019
Watch
Banner Image: Kelley Rivoire - Podcast Interview
Don't Miss an Episode!  Join our mailing list for episode summaries and other updates.

About this Episode

Today we're joined by Kelley Rivoire, engineering manager working on machine learning infrastructure at Stripe. Kelley and I caught up at a recent Strata Data conference where she presented the talk "Scaling model training: From flexible training APIs to resource management with Kubernetes." In our conversation, we discuss Stripe's machine learning infrastructure journey, including their start from a production focus as opposed to focusing on answering internal business questions. Kelley also details a few of their internal tools including Railyard, an API built to manage model training at scale. Finally, we discuss how the end users dealt with the shift to event-based, streaming models.

About the Guest

Kelley Rivoire

Massachusetts Institute of Technology

Connect with Kelley

Resources

Related Topics