Secrets of a Kaggle Grandmaster | TWIML - The Voice of Machine Learning & AI

What a start to week two of TWIMLcon 2021!

Today’s sessions featured speakers from WikiMedia, Prosus Group, Palo Alto Networks, Clorox, Dataiku, Janssen Pharmaceutical Companies, iRobot, Algorithmia, and ClearML sharing their thoughts on building and running data science and ML platforms. We also got an overview of major themes and trends in machine learning for 2021. Without further ado, let’s review.

The day started with a chat with Chris Albon, Director of Machine Learning at WikiMedia, the foundation responsible for Wikipedia. Chris shared a lot about what it’s like to build an ML team that supports one of the largest websites in the world and to do so completely open-source and transparently. He had a lot of great advice on choosing open source technologies with vibrant communities of developers who have thought hard about the problem space and the solutions required by their customers. He had some thoughts about how we might want to adjust how we think about models:

“Now it’s about productization and treating models less like a crystal chandelier and more like a disposable coffee cup. If you find a better one, use it and throw away the old one.”

He also touched on one of the themes that has been emerging through the conference, that of the “full-stack engineer” vs specialization, and made the case that full-stack may be in vogue but that he really valued specialization.

Next up, we heard from Paul van der Boor, Senior Director of Data Science at Prosus Group. Paul jokingly refers to Prosus as the biggest internet company that you’ve never heard of. They are a global consumer internet group and one of the largest technology investors in the world. Their portfolio of companies serves more than 1.5 billion people in more than 80 countries and covers classifieds, payment/fintech, food, and education.

Paul shared the ML platform architectures of three of their portfolio companies: OLX, iFood, and Swiggy. These architecture reviews were extremely informative; check out the replay for full details.

From looking at and working with all these companies and more, he and his team have extracted a set of general principles that they recommend when making platform decisions:

Architect for change - the user interface should be a separate layer of abstraction from the infrastructure below it so you have flexibility to change the underlying infrastructure and components.
Use multiple components in parallel; try new things under the cover and see what works.
Don’t reinvent the wheel - if there is a component off-the-shelf that works then consider using it vs. building it. There are many good components, tools, platforms, and services available now. There is no reason not to use them as long as they’re abstracted from the user interface (point one above.)
Use tools that will scale to the degree you need them to scale.
Take the MLOps perspective and build for the long haul.

What great advice.

Continuing the theme of AI Operationalization, we had a panel discussion with Rasool Tahmasbi (Lead Data Scientist, Palo Alto Networks), Sarah Cullem (Director, Head of DTC Analytics & Data Science, Clorox), and Mike Becker (Data Scientist, The Janssen Pharmaceutical Companies of Johnson & Johnson), led by Conor Jensen, Director of AI Consulting and Data Science, Dataiku).

Some highlights from the panel:

Sarah made the case that simplification of tools and technologies allows your team to focus on solving the real business problems. She shared three keys to success:

set realistic expectations;
clearly define shared terms
be clear if the output will be used by only machines or machines AND humans so that you can deliver what’s needed to the business.

Conor Jensen outlined patterns of success that he has seen while working in customers across many industries:

Create a Center of Excellence (CoE) that sets up processes and tools;
Use prototypes to help get buy-in from leadership;
Recognize that the UI that people use to interact with your model…might be as important as the model itself.

Rasool and Mike agreed on using POCs (proof of concepts) as a means to get buy-in and emphasized going for “small early quick wins.”

Next up was one of our “Team Teardowns,” this time featuring a discussion with iRobot ML team members Danielle Dean (Technical Director of Machine Learning), Mathew Salvari (Lead Principal Machine Learning Scientist), and Mohan Muppidi (ML Cloud Architect).

We asked them their thoughts on structure vs. flexibility and found that they were VERY strong proponents of structure. It appears that they are very prescriptive in their tools, infrastructure, and processes and they shared a pretty strong belief that standardization simplifies collaboration and speeds up development.

One interesting tip that Danielle raised when we discussed the issue of maintaining team alignment, was their “hub-and-spoke” model. In this approach, they have a central ops team setting standards but also have somebody with operational skills in each major product team so that they can work together to maintain alignment to standards and practices.

The last session of the day before the networking and workshop was a talk by Diego Oppenheimer, CEO of Algorithmia. He shared the results of their third annual ML survey titled 2021 Enterprise Trends in Machine Learning. In this survey they found 10 key trends across four main themes and covering budgets, use cases, model counts, governance, technology integration, organizational alignment, model deployment times, deployment challenges, and the costs of build vs. buy. We won’t steal his thunder here. We suggest that everybody go grab the full report. You can find it at: https://tinyurl.com/twiml.

Diego’s passion for ML and MLOps was clear and he had a few great quotes:

Scaling the greatest technology: “This is the greatest technology of our lifetime, now it’s about getting the tools to be able to do it at scale”
MLOps is about Speed: “I like to think about it as building a high speed highway. The existence of the highway doesn’t mean there aren’t controls (tolls, highway patrol) but it allows cars to move faster between destinations.”
Focus on the business impact, not on DIY: “We are builders - building is exciting. But what’s even more exciting is moving the needle for a business, so focusing on that is the best way to increase focus (and funding) for ML efforts.”

Following a brief networking session, Ariel Biller, from ClearML wrapped up our day by providing his thoughts on the state of ML and MLOps, stating that “ML/DL research is inherently messy. MLOps (automation, orchestration, reproducibility, and workflow integration) is the missing element.”

Before launching into a thorough demo of ClearML, he shared the 4-step process that he believes most ML teams go through when trying to architect and build their ML Platform. We think it was only half in jest.

We don’t need one;
We’ll build it ourselves;
We’ll build it again, but right this time;
Let’s go get one that’s already built.

Huge shout-out to Chris, Paul, Rasool, Sarah, Conor, Mike, Mathew, Mohan, Danielle, Diego, and Ariel for their insights and humor in today’s sessions.

If you missed the conference today, it’s not too late to register for TWIMLcon! There are still two more days of sessions as well as an unconference. Even if you have missed most of the conference, it’s not too late to sign up now and get access to all the conference session replays.