09 Apr NVIDIA Reinforces Machine Learning Training Lead Via Platform Improvements At GTC 2018

NVIDIA – CEO Jensen Huang kicking off GTC 2018

Last week, Moor Insights & Strategy analysts Anshel Sag, Chris Wilder, Karl Freund and I attended NVIDIA GTC 2018, what I consider the industry leader’s premier GPU developer conference. If you’re unfamiliar with GTC, I wrote a preview here. As per usual with these events, the main keynote, given by CEO Jensen Huang, was the time to catch the biggest announcements coming out of the conference. More importantly, the event was also an opportunity to meet with NVIDIA developers, customers, press and senior leaders.

What I wanted to do here is hit some of the keynote’s highlights, all two-and-a-half-hours of it, so that you don’t have to. Here’s my recap of the big announcements, and my take on them. Karl has already written on Arm’s integration of NVIDIA NVDLA here and NVIDIA Drive simulator here and analyst Rhett Dillingham also wrote about NVIDIA’s latest and greatest in VDI here.

Ray tracing with Quadro GV100 with NVIDIA RTX

Jensen Huang took the GTC 2018 stage in his trademark leather jacket, and after thanking all developers, launched into the first order of business: NVIDIA RTX ray tracing technology. He spent some time explaining ray tracing technology, and the possibilities it opens for more cinematic effects.

It’s kind of funny—the cinematic vision never shifts, but the way to get there and the reality of it changes. Ray tracing is one of the best ways to create completely realistic objects and scenes, but it has historically been hampered by the fact that it takes so much performance and bandwidth. To address this problem, Huang announced the Quadro GV100 with NVIDIA RTX Technology. Lauded by NVIDIA as a “giant leap for real-time computer graphics,” this new offering sports 32 gigabytes of memory, and the ability to scale up to 64 gigabytes with multiple Quadro GPUs utilizing NVLink interconnect technology.

NVIDIA – Quadro GV100 with RTX Technology

Quadro GV100 is based on the company’s powerful Volta GPU architecture and features NVIDIA’s OptiX AI-denoiser, which NVIDIA says will deliver nearly 100 times the performance of CPUs and allow the offering to achieve real-time, noise-free, cinematic-quality rendering.

This offering looks like it has the potential to address previous issues with ray tracing—Volta’s architecture and the AI denoiser should help address the “last foot” performance side of things, while the NVLink bus will address the bandwidth issues. With NVLink, cards can talk to each other and share memory, lessening the need to go over the PCIe bus. The solution is still limited to certain lighting techniques and is not universally applicable to all types of rendering quite yet. When combined with other rendering techniques, it can deliver the most amazing real-time graphics you’ve ever seen and I’m excited about it.

Deep learning improvements to the Tesla V100

As far as deep learning goes, NVIDIA chose to focus on platform-level advances versus bringing out a brand-new compute architecture or chip. NVIDIA announced at GTC that it is doubling the Tesla V100’s HBM2 memory to 32GB, effective immediately across the entire V100 family. This will reduce the memory copies from the GPU to main memory, which in turn will reduce latency and improve performance. It’s a great upgrade for this workhorse product, coming at a good time.

Funny that not a single commercially-focused person I talked with at the event had an issue without a new chip. Consumer gaming card followers did want to hear about something new but didn’t get it. The lack of a new chip reinforces just how much of a lead NVIDIA right now in top-end performance. It also signals just how many big things can be done with platform improvements. Sometimes the industry, because of the lack of platform and software knowledge, fixates on new chip hardware.

“The World’s Largest GPU”

Likely the biggest unveiling at GTC 2018 was DGX-2, which NVIDIA is heralding as “the world’s largest GPU.” Purists rolled their eyes as it isn’t exactly a new GPU, but I get why NVIDIA positioned it as this. What Google claims as a TPU has multiple chips with memory crossbar, IO, memory, storage, and Xeons. DGX-2 is like this and includes storage and two Intel Xeons.  DGX-2 is a beast of a platform, not a new GPU and is expanding the definition of the GPU. Now NVIDIA and others can compare TPU and GPU performance more equivalently.

NVIDIA – The World’s Largest GPU

DGX-2 features 16 (twice previous limit) of the new 32GB Tesla V100s, delivering 81,920 CUDA cores, tied together with the new NVSwitch shared memory fabric. The inclusion of the NVSwitch is the biggest news of DGX-2. It enables the improved sharing of memory, reducing swaps to real memory. This new interconnect fabric will increase performance and reduce latency between servers with GPU compute, enabling easier “ganging” together to improve parallelism. It allows up to 16 GPUs to talk to each other at speeds of up to 2.4 terabytes per second.

This is impressive stuff and is not only a quick time to market, but highest performance solution, too. You can buy it now for a mere $399,000.

NVIDIA – DGX-2 up close and personal

I find it fascinating that NVIDIA is doing rack compute servers on their own and cutting out the OEMs, but when you need to get it out right now and at the highest performance with the switch, this is what you have to do. This isn’t easy cheap as you’re not just plugging cards into a PCIe bus and takes a lot of development dollars.

Autonomous driving simulator

Jenson also took some time to talk about NVIDIA’s new DRIVE Constellation, a sophisticated, cloud-based simulation system designed to test and validate self-driving cars. The simulator is comprised of two servers—the first simulates a self-driving car’s various sensors (cameras, radar, etc.), while the second contains an NVIDIA DRIVE Pegasus AI car computer, which processes the simulated data from the first server and feeds it back to the simulator. If you’d like to read more about DRIVE Constellation, our analyst Karl Freund did a deeper dive here.

Announcing NVIDIA Drive Sim and Constellation

This all makes so much sense, if you ask me—given the company’s background in games and movie creation, NVIDIA can provide graphics just about as realistic as one could possibly want in a simulator. It’s important to note that the driving learning output would only be as good as the overall simulated environment, including the physics of how the objects work.

I like that car companies can now simulate tens of billions of miles, instead of having to do it in a real car and potentially endangering lives.

Arm integrates NVDLA CNN ASIC into Trillium

There were several important announcements not included in the keynote that I wanted to go ahead and say a few words on. The first was that Arm Holdings would be incorporating NVIDIA IP into its recently announced Project Trillium, a suite of AI hardware for deep learning. NVIDIA’s deep learning accelerator (NVDLA) was open-sourced last fall, giving free license to any developer looking to build a chip that utilizes Convolutional Neural Networks (CNNs) for inference (read analyst Karl Freund’s in-depth coverage here).

This is one of the more interesting announcements as many counted out NVIDIA out of the “very small edge.” NVIDIA is engaged in AI in places like drones and robots, but this announcement could enable NVIDIA ML tech to be in even smaller IoT devices like home automation and even smartphones. Partnering with Arm does not guarantee NVIDIA NVDLA success at the “very small edge,” but increases its chances greatly.

Also, don’t be confused that this is Arm Trillium’s only ML play- it is not. Arm will have their ML solution beyond CNN, and I expect Arm to integrate more AI solutions into its framework going forward.

AIRI platform from Pure Storage and NVIDIA

AIRI platform from Pure Storage and NVIDIA

One last big announcement that didn’t make the keynote cut was a new AI platform called AIRI (AI Ready Infrastructure), co-engineered by Pure Storage and NVIDIA and distributed by their partners. AIRI is a converged infrastructure reference arch targeted towards enterprises, which includes NVIDIA DGX-1s, Pure Storage FlashBlades, and Arista networking. Some think good AI solutions stop at the compute and memory layer- that’s incorrect.

The best AI solutions encompass compute, memory, storage and networking and this is what AIRI does.

Pure Storage is cleaning up in the AI storage scene as the company has optimized its solutions for the very specific AI, ML and DL workloads, and process. The AIRI announcement moves the stakes up a notch into a co-developed reference rack architecture that’s been optimized, tested, and validated for NVIDIA CUDA-based ML workloads.

Wrapping up

Clearly, there was no lack of content to chew on at GTC 2018. NVIDIA unveiled new platforms, solutions, and technology (Quadro GV100 with NVIDIA RTX, the beastly DGX-2, NVSwitch memory fabric, AIRI), as well as a welcome memory upgrade to the Tesla V100. The DRIVE Constellation simulator looks to be a smart solution that takes advantage of all of NVIDIA’s strengths in the commercial and consumer sides, and I expect it is going to help many car companies bring their self-driving vehicles to market more quickly (and more safely). These are the sort of big announcements you would expect from a company on its A-game, which is precisely where NVIDIA is sitting right now.