Neural Network Quantization and Compression with Tijmen Blankevoort
EPISODE 292
|
AUGUST
19,
2019
Watch
Follow
Share
About this Episode
Today we're joined by Tijmen Blankevoort, a staff engineer at Qualcomm, who leads their compression and quantization research teams.
Tijmen was also the CTO at ML startup Scyfer, which he co-founded with Qualcomm colleague Max Welling, who we spoke with back on episode 267. In our conversation with Tijmen, we discuss the ins and outs of compression and quantization of ML models, including how much models can actually be compressed, and the best way to achieve it. We also look at the recent "Lottery Hypothesis" paper and how that factors into this research, and best practices for training efficient networks. Finally, Tijmen recommends a few algorithms for those interested, including tensor factorization and channel pruning.
About the Guest
Tijmen Blankevoort
Qualcomm
Resources
- Paper: Relaxed Quantization for Discretized Neural Networks
- Paper: Data-Free Quantization through Weight Equalization and Bias Correction
- Paper: EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis
- Paper: MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications