Assessing Data Quality at Shopify with Wendy Foster
EPISODE 592
|
SEPTEMBER
19,
2022
Watch
Follow
Share
About this Episode
Today we’re back with another installment of our Data-Centric AI series, joined by Wendy Foster, a director of engineering & data science at Shopify. In our conversation with Wendy, we explore the differences between data-centric and model-centric approaches and how they manifest at Shopify, including on her team, which is responsible for utilizing merchant and product data to assist individual vendors on the platform. We discuss how they address, maintain, and improve data quality, emphasizing the importance of coverage and “freshness” data when solving constantly evolving use cases. Finally, we discuss how data is taxonomized at the company and the challenges that present themselves when producing large-scale ML models, future use cases that Wendy expects her team to tackle, and we briefly explore Merlin, Shopify’s new ML platform (that you can hear more about at TWIMLcon!), and how it fits into the broader scope of ML at the company.
About the Guest
Wendy Foster
Shopify
Resources
- Blog: The Magic of Merlin: Shopify's New Machine Learning Platform
- Blog: Using Rich Image and Text Data to Categorize Products at Scale
- Blog: Building a Real-time Buyer Signal Data Pipeline for Shopify Inbox
- Article: On AI Ethics: Wendy Foster, Director Of Engineering And Data Science At Shopify
- Fighting Fraud with Machine Learning at Shopify with Solmaz Shahalizadeh - #60
- TWIMLcon: AI Platforms 2021 Keynote Interview: Solmaz Shahalizadeh
