Distilling Transformers and Diffusion Models for Robust Edge Use Cases | TWIML

About this Episode

Today, we're joined by Fatih Porikli, senior director of technology at Qualcomm AI Research for an in-depth look at several of Qualcomm's accepted papers and demos featured at this year’s CVPR conference. We start with “DiMA: Distilling Multi-modal Large Language Models for Autonomous Driving,” an end-to-end autonomous driving system that incorporates distilling large language models for structured scene understanding and safe planning motion in critical "long-tail" scenarios. We explore how DiMA utilizes LLMs' world knowledge and efficient transformer-based models to significantly reduce collision rates and trajectory errors. We then discuss “SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation,” a diffusion-distilled approach that combines generative models with metric depth estimation to produce sharp, accurate monocular depth maps. Additionally, Fatih also shares a look at Qualcomm’s on-device demos, including text-to-3D mesh generation, real-time image-to-video and video-to-video generation, and a multi-modal visual question-answering assistant.

Distilling Transformers and Diffusion Models for Robust Edge Use Cases with Fatih Porikli

About this Episode

About the Guest

Fatih Porikli

Resources

Related Topics

Distilling Transformers and Diffusion Models for Robust Edge Use Cases with Fatih Porikli

About this Episode

About the Guest

Fatih Porikli

Resources

Related Topics

Related Episodes