Stealing Part of a Production Language Model with Nicholas Carlini

EPISODE 702

September 23, 2024

WATCH

Play Video

Join our list for notifications and early access to events

About this Episode

Today, we're joined by Nicholas Carlini, research scientist at Google DeepMind to discuss adversarial machine learning and model security, focusing on his 2024 ICML best paper winner, “Stealing part of a production language model.” We dig into this work, which demonstrated the ability to successfully steal the last layer of production language models including ChatGPT and PaLM-2. Nicholas shares the current landscape of AI security research in the age of LLMs, the implications of model stealing, ethical concerns surrounding model privacy, how the attack works, and the significance of the embedding layer in language models. We also discuss the remediation strategies implemented by OpenAI and Google, and the future directions in the field of AI security. Plus, we also cover his other ICML 2024 best paper, “Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining,” which questions the use and promotion of differential privacy in conjunction with pre-trained models.

Connect with Nicholas

Resources

Related Episodes

LLMs for Equities Feature Forecasting at Two Sigma

RAG Risks: Why Retrieval-Augmented LLMs Are Not Safer

CTIBench: Evaluating LLMs in Cyber Threat Intelligence

Exploring the “Biology” of LLMs with Circuit Tracing

Teaching LLMs to Self-Reflect with Reinforcement Learning

One Response

Ryan Hale says:

September 24, 2024 at 2:39 pm

I am new to this podcast series and really appreciate how much I am learning from your interviews…thank you!
The question that I wish you asked Nicholas is “are you using LLMs to discover new and more effective ways to attack LLMs?”

Reply

Stealing Part of a Production Language Model with Nicholas Carlini

About this Episode

Connect with Nicholas

Resources

Related Episodes

Related Topics

More from TWIML

One Response

Leave a Reply Cancel reply