Intelligent content that gives practitioners, innovators and leaders an inside look at the present and future of ML & AI technologies.

LATEST
Play Video
EPISODE 728  |  
April 23, 2025
In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications.
RECENT

INSIGHTS

LATEST REPORT

Retrieval-augmented generation promised to bring ChatGPT’s magic to enterprise data. But while organizations rushed to build chatbots, they often struggled to deliver real business value. This comprehensive guide reveals RAG’s full potential beyond conversational interfaces.

Community

The TWIML Community is a global network of machine learning, deep learning and AI practitioners and enthusiasts.

We organize ongoing educational programs including study groups for several popular ML/AI courses such as Fast.ai Deep Learning, Machine learning and NLP, Stanford CS224N, Deeplearning.ai and more. We also host several special interest groups focused on topics like Swift for Tensorflow, and competing in Kaggle competitions.

TWIML Community

Work with Us

TWIML creates and curates intelligent content that helps makers build better experiences for their users, and gives executives an inside look at the real-world application of intelligence technologies. We also build and support communities of innovators who are as excited about these technologies as we are. We advise a variety of leading organizations as well, helping to craft strategies for taking advantage of the vast opportunities created by ML and AI.