Today was the sixth day of TWIMLcon and the final day of presentations before we head into a full day of workshops and then a wrap-up unconference. Today we were fortunate to speak to folks from LinkedIn, Intuit, Cloudera, Yelp, Rakuten, Microsoft, Salesforce, and Fiddler. We covered a variety of subjects including:
- how to build out an ML Platform team and gain success over time;
- that there is now a MDLC (Model Development Lifecycle) to go along with our SDLC (Software Development Lifecycle);
- how you should support your data team on “Day 1” and also “Day N”;
- how you can work better with the business by also providing transparency and visibility, education, and shared ROI analysis;
- three key persona groups and what their different requirements and needs are from an ML platform;
- how and why testing and experimentation are different but complementary;
- challenges to operationalizing ML; characteristics of a good ML platform; a rule of thumb for build vs. buy, and a powerful vendor-agnostic end-to-end ML stack architecture;
- tips and techniques for taking Responsible AI from a rare conversation to a key step in your overall enterprise software development lifecycle.
Ya Xu, Head of Data Science, at LinkedIn kicked things off by sharing her thoughts on the three stages of building a platform: Build Phase; Adoption Phase; Maturity Phase. She believes that if you build an easily extensible platform that solves your user’s needs, that they’ll stick with your platform and not try to build their own. She shared some shocking scale numbers, noting that LinkedIn often has over 500 experiments actively running at any given time. She summarized why we designed and built this entire conference:
“I’m a big fan of platforms. We have an experiments platform, our main ML platform, an artifact catalog (DataHub), an anomaly detection platform, and a distributed OLAP system for data storage. Platforms let you move faster.”
Ian Sebanja (Product Manager, Intuit Machine Learning Platform) and Srivathsan Canchi (Head of Engineering, ML Platform Team) shared their two-pronged approach on ensuring that costs are known and ROI can be calculated consistently.
The first was minimizing overload on the data science team by ensuring that things (models, features, resources) are all tracked without the need for their input. They build on that by providing as much automation as possible to help the data science team be as effective as possible. Finally they have designed “smart defaults” so that users can spin up instances as needed and infrastructure spins down afterwards automatically to avoid unwanted infrastructure spend.
The second prong involved being as transparent and educational as possible. They surface all infrastructure cost information to the developers at the point of execution so that they can make good business decisions and trade-offs with regard to speed or performance vs. cost. Then the data team and ops team can also have shared information from which to assess and calculate the ROI of a given project.
Justin Norman, VP Data Science & Analytics from Yelp joined us next and shared with us a lot of the Yelp ML Platform stack architecture. He made the case that:
“The goal is to produce an ML system that functions reliably and that can replicate that at scale. We need to run experiments and we also need to test. These are different!.”
He then provided us all with a simple rule of thumb to use, suggesting that if we can fill in the following, that we’re ready to proceed and if we can’t, then we have more work to do before moving to the next step:
“If we [build this thing defined by the ML developer] then [this metric defined by the data scientist] will move because of [this change in behavior identified by the product manager.]”
Next up, Mohamed Elgendy, formerly of Amazon, Intel, Twilio, Rakuten, and new CEO and Co-founder of a new startup called Kolena.io, shared with us a comprehensive review/summary of the MLOps space. It contained ML operationalization challenges, a discussion of technical debt in AI/ML, a guess at what you have running today in your own shop, characteristics of a real (full) ML platform, some rules of thumb on build vs. buy, recommendations if you decide to build or buy, and finally an invitation to a new ML community he’s building at Kolena.io. Phew!
We then moved into a panel discussion with Romer Rosales (Head of Consumer AI, LinkedIn), Sarah Bird (Principal Program Manager, Microsoft) and Kathy Baxter (Principal Architect, Ethical AI Practice, Salesforce.) Sarah kicked us off with a comment that echoed Diego Oppenheimer’s quote from early in the week:
“Last year the conversation was ‘How do we think about this? This year, the principles are known and it’s more about scaling Ethical AI.”
Romer made the case that while it takes a village to do responsible AI, there is no option to NOT do this and that it’s key to have executive support for your Responsible AI initiatives.
Sarah shared a tactic that their team uses which is to have a “ship room” where any team can come and work directly with Responsible AI experts to ensure that their product both meets the standards and will ship on time. Cool concept!
Kathy shared a tactic used by her team (borrowed from Timnit Gebru and Deb Raji) of providing “model cards” that are like nutrition labels on food that outline the model lineage, data sources, training approach and more.
All three shared examples of how their respective teams are directly impacting product releases. They shared a common belief that making Responsible AI a major element of the full software development cycle was a better approach than using it as a last-minute audit and enforcement function on product teams.
Our last session of the day was a workshop with Krishna Gade (CEO/Founder) and Rob Harrell (Product Manager) discussing how Fiddler provides Explainable Monitoring for AI/ML projects. They shared a couple of key points:
- Most models are a black box: In most cases we have no visibility into model performance, no monitoring to catch potential bias or drift, and no explanations of model behavior / predictions.
- Deployed AI systems are error-prone: Error creeps in through data bias, data drive, feature processing, data pipelines, model performance decay, model bias, all of that impacts KPIs and impacts the business (and possibly customers.)
- We need APM for AI/ML: Product managers and Developers have Application Performance Monitoring tools; we need the same thing for AI and ML.
We would like to thank Ya, Srivathsan, Ian, Priyank, Justin, Mohamed, Rober, Sarah, Kathy, Krishma, and Rob for sharing all their hard-earned lessons with us today.
We are now heading into the final two days of the conference. We have a full day of workshops tomorrow and then an unconference on Friday.
If you missed registration or were unable to attend, you can still register now (you’ll need the Pro Plus or Exec summit pass) and you’ll have full ongoing access to all the incredible sessions from the conference. The only thing you’ll miss was the awesome networking and the swag bag!