Scaling Model Training with Kubernetes at Stripe with Kelley Rivoire
EPISODE 272
|
JUNE
6,
2019
Watch
Follow
Share
About this Episode
Today we're joined by Kelley Rivoire, engineering manager working on machine learning infrastructure at Stripe.
Kelley and I caught up at a recent Strata Data conference where she presented the talk "Scaling model training: From flexible training APIs to resource management with Kubernetes." In our conversation, we discuss Stripe's machine learning infrastructure journey, including their start from a production focus as opposed to focusing on answering internal business questions. Kelley also details a few of their internal tools including Railyard, an API built to manage model training at scale. Finally, we discuss how the end users dealt with the shift to event-based, streaming models.
About the Guest
Kelley Rivoire
Massachusetts Institute of Technology
