Gen AI at the Edge: Qualcomm AI Research at CVPR 2024 with Fatih Porikli
EPISODE 688
|
JUNE
10,
2024
Watch
Follow
Share
About this Episode
Today we’re joined by Fatih Porikli, senior director of technology at Qualcomm AI Research. In our conversation, we covered several of the Qualcomm team’s 16 accepted main track and workshop papers at this year’s CVPR conference. The papers span a variety of generative AI and traditional computer vision topics, with an emphasis on increased training and inference efficiency for mobile and edge deployment. We explore efficient diffusion models for text-to-image generation, grounded reasoning in videos using language models, real-time on-device 360° image generation for video portrait relighting, unique video-language model for situated interactions like fitness coaching, and visual reasoning model and benchmark for interpreting complex mathematical plots, and more! We also touched on several of the demos the team will be presenting at the conference, including multi-modal vision-language models (LLaVA) and parameter-efficient fine tuning (LoRA) on mobile phones.
About the Guest
Fatih Porikli
Qualcomm
Thanks to our sponsor Qualcomm AI Research
Qualcomm AI Research is dedicated to advancing AI to make its core capabilities — perception, reasoning, and action — ubiquitous across devices. Their work makes it possible for billions of users around the world to have AI-enhanced experiences on devices powered by Qualcomm Technologies. To learn more about what Qualcomm Technologies is up to on the research front, visit twimlai.com/qualcomm.
Resources
- DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions
- Clockwork Diffusion: Efficient Generation With Model-Step Distillation
- AugFlow: Free Optical Flow Augmentation by Robust Occlusion-Aware Video Frame Interpolation
- Low-Latency Neural Stereo Streaming
- AUEditNet: Dual-Branch Facial Action Unit Intensity Manipulation with Implicit Disentanglement
- ELVM: Efficient Large Vision Models (CVPR site)
- OmniCV (Omnidirectional Computer Vision)
- On Speculative Decoding for Multimodal Large Language Models
- SciFlow: Empowering Lightweight Optical Flow Models with Self-Cleaning Iterations
- EdgeRelight360: Text-Conditioned 360-Degree HDR Image Generation for Real-Time On-Device Video Portrait Relighting
- MMFM2: Look, Remember and Reason: Grounded Reasoning in Videos with Language Models
- 2nd Workshop for Learning 3D with Multi-View Supervision (3DMV)
- MonoSelfRecon: Purely Self-Supervised Explicit Generalizable 3D Reconstruction of Indoor Scenes from Monocular RGB Views
- Data Augmentation and Optimized Architectures for Computer Vision with Fatih Porikli - #635
- Optical Flow Estimation, Panoptic Segmentation, and Vision Transformers with Fatih Porikli - #579
- Quantizing Transformers by Helping Attention Heads Do Nothing with Markus Nagel - #663
- What’s Next in LLM Reasoning? with Roland Memisevic - #646

