Block-Sparse Kernels for Deep Neural Networks with Durk Kingma

800 800 This Week in Machine Learning & AI

The show is part of a series that I’m really excited about, in part because I’ve been working to bring them to you for quite a while now. The focus of the series is a sampling of the interesting work being done over at OpenAI, the independent AI research lab founded by Elon Musk, Sam Altman and others. This episode features Durk Kingma, a Research Scientist at OpenAI. Although Durk is probably best known for his pioneering work on variational autoencoders, he joined me this time to talk through his latest project on block sparse kernels, which OpenAI just published this week.

Block-sparsity is a property of certain neural network representations, and OpenAI’s work on developing block-sparse kernels helps make it more computationally efficient to take advantage of them. In addition to covering block sparse kernels themselves and the background required to understand them, we also discuss why they’re important and walk through some examples of how they can be used. I’m happy to present another fine Nerd Alert show to close out this OpenAI Series, and I know you’ll enjoy it!.

Thanks to our Sponsor

Support for this OpenAI Series is brought to you by our friends at NVIDIA, a company which is also a supporter of OpenAI itself. If you’re listening to this podcast, you already know about NVIDIA and all the great things they’re doing to support advancements in AI research and practice. What you may not know is that the company has a significant presence at the NIPS conference going on this week in Long Beach California, including four accepted papers. To learn more about the NVIDIA presence at NIPS head on over to, and be sure to visit them at the conference.

TWiML Online Meetup

The details of our next TWiML Online Meetup have been posted! On Wednesday, December 13th, at 3pm Pacific Time, we will be joined by Bruno Gonçalves, who will be presenting the paper “Understanding Deep Learning Requires Rethinking Generalization”. If you’re already registered for the meetup, you should have already received an invitation with all the details. If you still need to register for the meetup, head over to to do so. We hope to see you there!

About Durk

Mentioned in the Interview

    • Durk Kingma

      Hi Marc, dropout introduces sparsity in the neural net activations, while the kernels we introduce introduce sparsity in the neural net weights; it’s a different kind of sparsity. Writing kernels that make use of activation sparsity (such as with binary dropout or ReLU nonlinearities), is trickier than weight sparsity, since in the former case the sparsity pattern differs per input example. In case of weights, the sparsity pattern is shared across input examples (across the minibatch) so is much easier to efficiently parallelize across GPU cores.

Leave a Reply

Your email address will not be published.