Networking Optimizations for Multi-Node Deep Learning on Kubernetes with Erez Cohen

EPISODE 345
LISTEN
Banner Image: Erez Cohen - Podcast Interview
Join our list for notifications and early access to events

About this Episode

Today we conclude our KubeCon ‘19 Series joined by Erez Cohen, VP of CloudX & AI at Mellanox. We caught up with Erez before his talk "Networking Optimizations for Multi-Node Deep Learning on Kubernetes," where he discusses problems and solutions related to networking discovered during the journey to reduce training time. In our conversation, we discuss NVIDIA's recent acquisition of Mellanox, and what fruits that relationship hopes to bear. We also discuss the evolution of technologies like RDMA, GPU Direct, and Sharp, Mellanox's solution to improve the performance of MPI operations, which can be found in NVIDIA's NCCL collective communications library. Finally, we explore how Mellanox is enabling Kubernetes and other platforms to take advantage of the various technologies mentioned above, and why we should care about networking in Deep Learning, which is a compute-bound process.
Connect with Erez
Read More

Related Episodes

Related Topics

More from TWIML

Leave a Reply

Your email address will not be published.