Scaling Model Training with Kubernetes at Stripe with Kelley Rivoire

EPISODE 272

Join our list for notifications and early access to events

About this Episode

Today we're joined by Kelley Rivoire, engineering manager working on machine learning infrastructure at Stripe.

Kelley and I caught up at a recent Strata Data conference where she presented the talk "Scaling model training: From flexible training APIs to resource management with Kubernetes." In our conversation, we discuss Stripe's machine learning infrastructure journey, including their start from a production focus as opposed to focusing on answering internal business questions. Kelley also details a few of their internal tools including Railyard, an API built to manage model training at scale. Finally, we discuss how the end users dealt with the shift to event-based, streaming models.

Connect with Kelley