Qualcomm's AI250 Attacks the AI Inference Memory Bottleneck

AI's memory bottleneck is breaking. Qualcomm is directly attacking the Total Cost of Ownership (TCO) for AI inference with its new AI200 and AI250 data center solutions, featuring a projected 10x leap in effective memory bandwidth. I sat down with Durga Malladi, SVP & GM of Technology Planning, Edge Solutions, and Data Center at Qualcomm, for a deep dive into this new hardware. We discuss how their "Near-Memory Computing" architecture in the AI250 is an assault on the token generation bottleneck, specifically targeting the 'decode' phase to boost tokens-per-second and crush TCO. This isn't just about raw performance; it's a strategic "mix and match" play. Durga explains how hyperscalers & CSPs can integrate the AI250's capabilities alongside their own custom silicon, offering a new level of flexibility. We also cover how Qualcomm is leveraging its decades of experience with the Hexagon NPU (from phones to auto) to create these new, highly scaled, direct liquid-cooled rack-level solutions.

Library

Qualcomm's AI250 Attacks the AI Inference Memory Bottleneck

Session Speakers

Durga Malladi