28 Sep NVIDIA Targets Next AI Frontiers: Inference And China

NVIDIA’s meteoric growth in the datacenter, where its business is now generating some $1.6B annually, has been largely driven by the demand to train deep neural networks for Machine Learning (ML) and Artificial Intelligence (AI)—an area where the computational requirements are simply mindboggling. Much of this business is coming from the largest datacenters in the US, including Amazon, Google, Facebook, IBM, and Microsoft. Recently, NVIDIA announced new technology and customer initiatives at its annual Beijing GTC event to help drive revenue in the inference market for Machine Learning, as well as solidify the company’s position in the huge Chinese AI market. For those unfamiliar, inference is where the trained neural network is used to predict and classify sample data. It is likely that the inference market will eventually be larger, in terms of chip unit volumes, than the training market; after all, once you train a neural network, you probably intend to use it and use it a lot. Therefore it is critical that NVIDIA capture its share of this market as AI moves from early R&D to commercial deployment, both in the cloud and at the edge.

What did NVIDIA announce?

As is typically the case, NVIDIA’s CEO, Jensen Huang, made these announcements during a keynote address at Graphics Technology Conference (GTC) in Beijing—the first stop on a worldwide tour of GTC events. First, and perhaps most importantly, Huang announced new TensorRT3 software that optimizes trained neural networks for inference processing on NVIDIA GPUs. TensorRT3 can be used to package, or compile, neural networks built with any ML framework, for deployment across the NVIDIA portfolio of datacenter and edge devices. TensorRT is essentially the CUDA of inferencing. As a result, Huang announced that TensorRT3 is now being deployed by all of China’s largest Internet datacenters, namely Alibaba, Baidu, Tencent, and JD.com for ML workloads.

NVIDIA  1: TensorRT software is the cornerstone that should enable NVIDIA to deliver optimized inference performance in the cloud and at the edge.

In addition to announcing the Chinese deployment wins, Huang provided some pretty compelling benchmarks to demonstrate the company’s prowess in accelerating Machine Learning inference operations, in the datacenter and at the edge. Note the ~20X increase in performance directly attributable to the new NVIDIA software (comparing the two V100 (Volta) results, in Figure 2).

Figure 2: Tensor RT3 performance for inference processing of images (ResNet-50)

In addition to the TensorRT3 deployments, Huang announced that the largest Chinese Cloud Service Providers, Alibaba, Baidu, and Tencent, are all offering the company’s newest Tesla V100 GPUs to their customers for scientific and deep learning applications. For customers wanting to deploy deep learning in their own datacenters, he announced that Huawei, Inspur, and Lenovo would be selling HGX-based servers with Volta to their global customer base. HGX is an 8-GPU chassis with the NVLink interconnect, used to provide high levels of GPU scaling in a dense package. HGX, announced earlier this year, was designed with Microsoft and is available as an open source hardware platform through the Open Compute program. The Lenovo win is significant, seeing as the company seeks a high-density GPU server for large-scale training workloads, and is, at least for now, the only global OEM to offer HGX.


Figure 3: the Open Compute HGX platform allows 8 P100 or V100 GPUs to connect to any server for Machine Learning acceleration

Continuing with the theme of inference processing in China, NVIDIA also announced that the JD.com delivery subsidiary would be using the NVIDIA Jetson platform to guide and control its land and air drone delivery services. Delivering products through China’s crowded highway infrastructure is unreliable and time-consuming. To address this growing challenge, JD.com plans to have a million drones, with NVIDIA Jetson on board, in service by 2020.

Figure 4: These self-piloting drones will help JD.Com quickly deliver goods through or above the congested Chinese urban transportation system.

Conclusions

As Machine Learning matures beyond the research and development stage, attention is turning to the processing needs for inference. This data can be quite simple, such as text or images, or incredibly demanding, such as real-time spoken translation and high-definition video/Lidar. Therefore the corresponding processing requirements will vary from simple mobile processors in our phones to miniature supercomputers in our autonomous vehicles. NVIDIA is not content with just being the brains behind the creation of these AIs, and is positioning itself to compete with CPUs, FPGAs, and ASICs for the coming explosion in datacenter and edge ML processing. The customer wins announced by Mr. Huang demonstrate that they have what it takes to be a player in the next phase of Machine Learning and AI. However, unlike with training, which has been an all-NVIDIA show, the diversity of inference data, latency, and power requirements will create a wide range of solutions and an interesting competitive landscape.