Today we conclude our KubeCon ‘19 Series joined by Erez Cohen, VP of CloudX & AI at Mellanox.
Subscribe: iTunes / Google Play / Spotify / RSS
We caught up with Erez before his talk “Networking Optimizations for Multi-Node Deep Learning on Kubernetes,” where he discusses problems and solutions related to networking discovered during the journey to reduce training time. In our conversation, we discuss NVIDIA’s recent acquisition of Mellanox, and what fruits that relationship hopes to bear. We also discuss the evolution of technologies like RDMA, GPU Direct, and Sharp, Mellanox’s solution to improve the performance of MPI operations, which can be found in NVIDIA’s NCCL collective communications library. Finally, we explore how Mellanox is enabling Kubernetes and other platforms to take advantage of the various technologies mentioned above, and why we should care about networking in Deep Learning, which is a compute-bound process.
Mentioned in the Interview
Check it out
- Check out the official TWIMLcon:AI Platform video packages here!
- Download our latest eBook, The Definitive Guide to AI Platforms!
- Check out our TWIML Presents: series page!
- Join the TWIML Community!
- Register for the TWIML Newsletter
“More On That Later” by Lee Rosevere licensed under CC By 4.0