TWIMLcon Day 1: If You’re Serious About Your Data, Invest in Your Platforms

We had a solid kick-off today at TWIMLcon 2021: AI Platforms. The conference started today and runs through January 29, 2021. It’s not too late to join in! Use discount code GREATCONTENT for 25% off registration.

We started off the day talking to Solmaz Shahalizadeh, VP of Commerce Intelligence, at Shopify. During her time there, she implemented the company’s first ML products, built their financial data warehouse, led multiple cross-functional teams, and played a critical role in their IPO. In our discussion today, she said something that set the tone for the rest of the conference:

“If you’re serious about your data, you want to invest in your platforms.”

We could not have said it better ourselves. She also shared lessons learned from building a team of hundreds of data scientists, for example, paying attention to how well each of the team members can articulate the real world impact or how the model will solve a specific business problem.

Next up, Aman Khan (Product Manager) and Josh Baer (ML Platform Product Lead) talked about how Spotify built its ML platform to provide service to over 300 million customers. They shared a few of the key tenets that now guide their approach to delivering ML infrastructure:

Build infrastructure together: Having your infrastructure teams and your ML teams collaborate to build a common platform serves the organization best.
Be opinionated: Having more tools is not better. Fewer tools leads to less custom code, less technical debt, and less confusion for the development team.
Make difficult trade-offs: They focused hard on building a platform that served their ML Engineers first and foremost, with the idea that once they nailed that, they could extend it to other roles in the organization.

We then shifted gears (pun completely intended) and talked with Sudeep Pillai, ML engineering team lead at Toyota Research Institute. Sudeep shared an overview of the MLOps environment developed at TRI and discussed some of the key ways MLOps techniques must be adapted to meet the needs of high-stakes environments like robotics and autonomous vehicles. He noted that early autonomous driving systems were strongly rule-based and rigid but that there has been a major shift away from rules-based systems and in his words:

“ML is eating the Autonomous Driving Stack.”

He further shared how ML moved into the Perception, Prediction, Planning, and Control aspects of Autonomous vehicle design. MLOps is sometimes thought of as “DevOps for Machine Learning” it was clear from Sudeep’s presentation that it needs to be more. MLOps at TRI is a complete set of processes specifically adapted not only for ML but also for the AD domain. It feels like the MLOps conversation is truly evolving and maturing when you see conversations like this one.

After speaking with Sudeep, we chatted with Mike Del Balso, CEO and Co-Founder of Tecton. He walked us through the issues of feature development, management, and deployment. It seems hard to believe but he noted that just getting a few features into production can delay a project by months or even a year because of the hand-off between the data science team and the data engineering teams. He made an observation which is important to highlight here:

“Feature stores are some of the highest value data we have in our organizations and we don’t manage them as such.”

Mike went on to share with us some customer success stories (like Atlassian reducing model deployment times from months to days while increasing accuracy by up to 20%). Overall, it was a great discussion and I think we’re going to be all hearing a lot more about feature stores in the year ahead.

As we rolled towards the end of Day 1, we had the great opportunity to hear from Dr. Jennifer Prendki, the founder and CEO of Alectio. Before founding Alectio, Jennifer was the VP of Machine Learning at Figure Eight, she built the first ML department from scratch at Atlassian, and she pioneered MLOps at Walmart Labs. Dr. Prendki and her team are challenging the long held belief that more data is a prerequisite to increasing the performance of an ML model. In order to break down her thesis that more data is not necessarily better, she unpacked what she refers to as “Data Prep Ops.” After unpacking Data Prep Ops in great detail which is too long to cover here, she summarized with a few major points:

Good data preparation is a prerequisite for doing ML well;
There is a Data Prep Ops market that is misunderstood and we as community members need to make it a first-class citizen in our MLOps practices;
Data Preparation is more than labeling – It is a multi-faceted set of complex operational processes that are effectively their own discipline;
Data prep can not be separated from the machine learning process – these two processes are related.

It was a great discussion, with lots of food for thought for practitioners out there wrestling with the “more data = better predictions” status quo.

In keeping with the TWIMLconnect theme at this year’s event, attendees had an opportunity to participate in a networking activity towards the end of the day. Attendees were randomly grouped into small breakout rooms for four lightning getting-to-know-you rounds. With smiles all the way around and folks complaining that the rounds were too short, it was clear everyone had a great time.

We wrapped up the day with Jeff Fletcher, a Cloud Machine Learning Specialist from Cloudera walking everybody through a workshop exploring how ML can be done on the Cloudera Data Platform, including data preparation, pipelines, and production deployment. Jeff was clearly in his element and happy to show off the power of their platform.

Tomorrow, we have a full schedule with:

A keynote interview with Faisal Siddiqi, Director of Engineering from Netflix;
Todd Underwood, an Engineering Director from Google will discuss what happens when “Good Models Go Bad”;
Dotan Asselman, Co-Founder/CTO of theator, and Ariel Biller, Evangelist for ClearML will talk about continuous training;
Chip Huyen (who wrote multiple amazing surveys of the MLOps market) will talk about the move to real-time ML;
Monte Zweben, CEO of Splice Machine will discuss how you can scale models by moving beyond the traditional database architectures and by combining operational, analytical and feature store databases onto a common platform;
Jeff Fletcher from Cloudera will close out the day with a continued look into the power of the Cloudera Data Platform.

If this sounds interesting, it’s not too late to register! There are still seven more days of sessions, including Friday’s Executive Summit. Pro Plus and Executive passes provide ongoing access to the conference recordings so that you can catch up after the event.

TWIMLcon Day 1: If You’re Serious About Your Data, Invest in Your Platforms

“If you’re serious about your data, you want to invest in your platforms.”

“ML is eating the Autonomous Driving Stack.”

“Feature stores are some of the highest value data we have in our organizations and we don’t manage them as such.”

Related Articles

From 1 to 100+ ML Models in Four Years

Architectural Patterns in ML

Building Agility and Velocity In Machine Learning From The Ground Up

TWIMLcon Day 5: Architecting ML Systems for Inevitable Change

Key Factors When Building a Global Data Science Team