Today, nearly all data experimentation at Yelp—from products to AI and machine learning—occurs on the custom-built Bunsen platform, with over 700 experiments in total being run at any one time. Bunsen supports the deployment of experiments to large but segmented parts of Yelp’s customer population, and it enables the company’s data scientists to roll back these experiments if need be.
However, adapting a digital product A/B testing system to support complex ML-powered use cases required advanced techniques, highly cross-functional product, engineering and ML teamwork and a unique design approach. This talk will explore lessons learned and best practices for building robust experimentation workflows into production machine learning deployments.