Innovation at the Edge and in the Metaverse at Qualcomm

A conversation with Ziad Asghar

Qualcomm Technologies provides connectivity and intelligence for a wide variety of devices, spanning mobile to vehicles to robotics and beyond. The company launched its first AI project in 2007 and has since launched 15 platforms and systems with AI on board. To support this innovation, the company invests in fundamental AI research in areas like neural compression, perception, reinforcement learning, and federated learning, which informs and powers their efforts to bring innovative new technologies to market across a variety of use cases.

I recently spoke with Qualcomm’s vice president of product management Ziad Asghar to get caught up on the company’s efforts to make the devices we interact with everyday smarter, more capable and more useful.

Improving 5G with AI

In a traditional 4G or 5G modem, the radio settings are hardwired in, so the radio cannot readily adapt to changing channel conditions like interference or reflections from buildings you’re passing.

What if, though, we could predict the right channel parameters from time-series measurements from the channel itself, using machine learning? This presents the opportunity of dynamically improving communications between the transmitter and receiver, resulting in greater bandwidth and throughput between them.

It turns out that this can work, and it’s been an active area of research for Qualcomm for some time now.

“We started working on the research aspect of [looking at] all of the different vectors in the modem where AI could enhance performance and now very quickly, we’re actually able to bring that to a product. We have a very quick feedback loop from research to commercialization.”

According to Ziad, this new technology has been incorporated into the new Snapdragon X70 5G Modem-RF platform, unveiled at this year’s Mobile World Congress. Ziad noted that this system, which supports every commercial 5G band from 600 MHz to 41 GHz, contains the world’s first 5G AI processor aimed at improving speed, coverage, latency, and power efficiency. The processor offers AI-based channel-state feedback and dynamic optimization, beam management, network selection, and adaptive antenna tuning, resulting in better quality communications in the most challenging channel conditions. This use case is also a great example of how, as Ziad explained, Qualcomm looks at AI as a horizontal enabler that touches all the other technologies in their portfolio.

I reviewed this work from a research perspective with Ziad’s colleague Joseph Soriaga about six months ago, but I had no idea until speaking with Ziad that it was so close to productization. Coming from a network comms background many years ago, I found that conversation super interesting. Be sure to check it out if you’d like to go through the details.

Autonomous Vehicles and Robotics

Another key area for Qualcomm is the application of AI technologies to connected and autonomous vehicles.

One specific area that Ziad touched on is multi-sensor fusion, also known as sensor-data integration. Multi-sensor fusion is the challenge of integrating, or fusing, together the data from a vehicle’s many different sensors such as LIDAR, radar, sonar, GPS, and cameras to provide a more complete picture of the vehicle’s context. Having better, and more comprehensive information from the vehicle’s sensors helps self-driving systems better understand the speed, location, and trajectory of the car and all objects around it in the perception phase of autonomous driving, and leads to better performance in the subsequent planning and control phases.

Sensor fusion is one of many capabilities of the Snapdragon Ride Platform, Qualcomm’s hardware and software platform for advanced driver assistance systems and automated driving.

Ziad noted that earlier in 2022, Qualcomm acquired Arriver, and subsequently integrated the company’s computer vision software stack into the Snapdragon Ride platform, which is itself part of the larger Snapdragon Digital Chassis platform that includes cockpit functions, connectivity, car-to-cloud, and positioning systems.

We also discussed the development of a “shadow mode” that helps improve autonomous vehicle training. Think imitation learning. A vehicle in shadow mode ingests all of its sensor data, creates its own perception of its environment, plans its actions, but never actually executes them, rather it compares that hypothetical set of actions to what the driver actually did.

“You almost need to get to a point where you want all the data from the vehicle going to an AI, and then looking at the decision that AI would have made and compare that to what the human actually did.”

This comparison between the driver’s behavior and the system’s decisions can be used to validate those decisions and identify those edge cases where the driver and the driving system would have made different choices.

Beyond autonomous driving, Ziad sees the work being done in that space as benefiting industrial robotics users as well:

“Autonomy or the autonomous driving aspect is going to be there much earlier in an industrial setup than it’s going to be on the road – it’s a more constrained problem. We can tie in this multi-modal sensor fusion idea from the automotive space and do it on mobile industrial robots.”

XR and the Metaverse

We then moved the discussion towards the “XR” space, which spans augmented reality (AR), virtual reality (VR), and mixed reality (MR). Ziad shared with me that they were leveraging the underlying approaches used for portrait mode on smartphone cameras (with the depth-of-field “bokeh” effect) in their work on XR wearables. That technology allows them to separate foreground subjects from backgrounds, which is an elemental function in something like AR glasses. He shared more on this:

[For AR glasses] “you need to be able to identify all the objects, do pristine 3D reconstruction, do very exact hand-tracking to be able to point to something in your field of view. Gestures become critical and doing that accurately matters. That’s all AI…You have to do plane detection so that you can identify a table and put virtual objects on it, all of which leverages the depth information we have.”

Here again, we touched on the theme of power management and efficiency, when Ziad shared how they’re using some approaches that let them deliver high-resolution images to exactly where a user is looking but then push lower-resolution images to areas in the user’s peripheral vision. This is a set of techniques known as foveated rendering, which is key to delivering AR/VR imagery efficiently.

This theme of “do more with less” continued as we discussed how Qualcomm is working to build natural language processing capabilities into their platforms. Many foundational language models such as BERT are just too large to run on mobile devices so Qualcomm has been working with Hugging Face to deploy DistilBERT on their platforms. DistilBERT is a compressed BERT language model that is 40% smaller and 60% faster while still retaining 97% of its language understanding capabilities. I asked Ziad what they would do with these types of capabilities and he gave me an example:

“At the end of last year, in an engagement with Hugging Face, we showed the emotional read-out (AKA sentiment analysis) of a particular message. In AR, you may not want to go through a whole list of notifications, but with AI, we can figure out which one of those notifications are key and bring that up to the user.”

Building on that, we discussed a variety of other use cases such as having your AR glasses summarize a document sent to you, prioritize incoming notifications, or provide you with sentiment analysis data on conversations you’re having with other people.

Of course, none of this can happen without their customers building on these platforms so we turned to the developer ecosystem. Ziad shared with me that they had recently launched their new XR development platform, Snapdragon Spaces, which is designed to enable developers to create immersive experiences for AR glasses.

Given the breadth of Qualcomm’s research and technology across the various areas supporting XR, such as connectivity, NLP, echo and noise cancellation, automated speech recognition, and more, it will be quite interesting to see the types of XR applications that they and their customers develop as organizations get more serious about the Metaverse.

Computer Vision

As we wound down the discussion on XR, I asked Ziad to share his thoughts on progress in the field of computer vision. He noted that there are still many interesting problems to solve to keep it interesting for researchers and practitioners:

“You can envision using AI to do auto-zooming in such a way that if you are filming a child running around a field, that the camera can zoom in and out to keep the child’s height the same, or you can do auto-framing of people on video conferences. As we do more augmented reality, we will have video and camera capture of those virtual objects. All the work that has been done on a single photo frame can now be applied to video at 60fps. There are so many opportunities to innovate.”

Another area where Ziad feels there is more work to do is making cameras not just for people but for robots and autonomous vehicles:

“When we optimize for human vision, we do segmentation, we make the skin look nice, and we have specific processing for all this. But within an autonomous vehicle context it’s not about making the picture look nice, It’s about knowing what’s in the frame. We might have a fully lit frame where half of the frame could be in the shade. We still need to be able to see what’s in the shade to create a safe experience for the car. We need to apply different sets of techniques for AV vision than for human vision.”

Edge / IoT

AR glasses and computer vision were a great launching point for a more general discussion of what’s happening in the arena of the “intelligent edge.” When I asked Ziad for his thoughts on how edge and AI are related, he called out three distinct phases of evolution:

Phase 1: Training and inference in the cloud.
Phase 2: Training in the cloud and inference on the edge.
Phase 3: Training in the cloud plus additional training and inference on the edge.

He believes that vendors operating in Phase 3 will be able to provide a lot more personalization through on-device learning and training that could result in an NLP system that really understands your specific accent, or an autonomous vehicle that understands the unique way you drive your own car, just to name a couple of examples.

I asked Ziad how that personalized learning at the edge makes its way back upstream to benefit the rest of the system, devices, and users. In other words, how does that personalized on-device learning get federated back to the greater whole? He agreed that a positive performance improvement learned on one device for one user could be abstracted and generalized for other users but it needs to be done in a way that does not leak personally identifiable information about the original user. Without preserving individual user privacy, Ziad felt that federated learning may not achieve its potential.

We closed this part of the discussion with Ziad sharing some thoughts on “sensing hubs.” He noted:

“We have the big engine that we have in our products, but we also have been recently focusing on what we call a sensing hub or a very small AI engine. You can envision auto, video, location, modem data, camera data, coming into that sensing hub. Think about your car when it’s parked. You want to be able to make sure that the cameras around it are working, but they’re working in a way that they’re just sipping on the power, especially in an electric vehicle context.”

The general idea being that if they can reduce the power consumption of AI at the edge sufficiently, it could enable a type of ambient intelligence that is constantly learning and improving based on available data.

Summary

With all of the opportunities we discussed, I still wanted to know what excited Ziad and his team the most. He replied with:

“The opportunities that the metaverse brings or Advanced Driver Assistance Systems (ADAS) are all coming and they’re demanding a lot more from us. Cognitive AI or the ability to bring in multi-modal data (sensor fusion) will yield huge benefits. The benefits in automotive and AR-VR will be huge. On-device learning. These are all areas we will be spending a lot of our time on going forward and that we’re most excited about.”

Check out the video for the full conversation with Ziad or these recent interviews with Qualcomm researchers for a more technical discussion of some of the technologies he and I touched on: