What to Do When You Need More GPUs

I recently reported on the launch of the new NVIDIA TITAN X. At the time it wasn’t in the hands of any users so any thoughts on relative performance were either vendor provided or speculative. Well, a couple of researchers on the MXNet team were among the lucky folks that have their hands on the GPU at this point and they published an initial benchmark this week following the deepmark deep learning benchmarking protocol.

In a nutshell they confirmed the speculation. The Pascal Titan X is about 30% faster than the GTX 1080 and its larger memory supports larger batch sizes for models like VGG and ResNet. Relative to the older Maxwell-based Titan X, the new GPU is 40-60% faster.

If a single GPU isn’t enough for you, you might be interested in the new prototype announced by Orange Silicon Valley and CocoLink Corp, which they’re calling the “world’s highest density Deep Learning Supercomputer in a box.” The machine loads 20 overclocked GPUs into a single 4U rack unit offering 57,600 cores delivering 100 TeraFLOPS. The team at Orange report that an ImageNet training job that used to take one and a half days with a single NVIDIA K40 GPU can now be done in 3.5 hours using 8 GTX 1080s. The largest they’ve been able to scale a training job to is 16 GPUs, and they’re continuing to work on scaling this to the full 20 GPUs.

Also in GPU news, Microsoft announced yesterday that Azure N-Series virtual machines are now available in preview. These VMs use Tesla K80 GPUs and the company claims these offer the fastest computational GPU performance in the public cloud. Moreover, unlike other cloud providers, these VMs expose the GPUs through via Discrete Device Assignment (DDA), resulting in near bare-metal performance. 6, 12 and 24 core flavors are available in the NC series of VM, which is optimized for computational workloads. An NV series that focuses more on visualization is also available, based on the Tesla M60 GPUs.