03 Mar A Machine Learning Landscape: Where AMD, Intel, NVIDIA, Qualcomm And Xilinx AI Engines Live

Without a doubt, 2016 was an amazing year for Machine Learning (ML) and Artificial Intelligence (AI) awareness in the press. But most people probably can’t name 3 applications for machine learning, other than self-driving cars and perhaps their voice activated assistant hiding in their phone. There’s also a lot of confusion about where the Artificial Intelligence program actually exists. When you ask Siri to play a song or tell you what the weather will be like tomorrow, does “she” live in your phone or in the Apple cloud? And what about Amazon’s Alexa? Where does “she” live? (The answer to both questions is, “In the cloud”.) And while you ponder those obscure question, many investors and technology recommenders are trying to determine whether Advanced Micro Devices, Intel, NVIDIA, Qualcomm or Xilinx will provide the best underlying hardware chips, for which application and why. To help sort this out, this article provides a landscape for emerging AI applications, by industry and deployment location (cloud, edge or hybrids) and explores what type of hardware will likely be used in each.

The Landscape: By Industry and Deployment Location

The sheer volume of applications being built using Machine Learning is truly breathtaking, as evidenced by over 2,300 investors funding over 1,700 startups, according to data compiled by Angel List. The graphic below shows a Machine Learning application landscape, using broad categories of applications that may run on simple or specialized edge devices, on servers in in the cloud or a in a hybrid configuration using edge devices with tightly coupled cloud resources.

A Machine Learning application landscape (Source: Moor Insights & Strategy)

The Hardware: CPUs, GPUs, ASICs and FPGAs

As I have explored in previous articles, there are two aspects of Machine Learning: training the neural network with massive amounts of sample data and then using the trained network to infer some attribute about a new data sample. The job of training the network to “think” is typically performed in large datacenters on GPUs, almost exclusively provided by NVIDIA. Since that market domination appears to be pretty stable, at least for the time being (see my article about Intel’s acquired Nervana Technology for a potential challenger), I will focus here on the hardware used in inference, where the AI is actually deployed. The graphic below lays out the wide range of hardware targeting Machine Learning from leading vendors.

When it comes to Machine Learning, the fact is that there is not “One Chip to Rule Them All”. While all vendors claim their architecture (CPU, GPU, ASIC or FPGA) to be “the best” for AI and Machine Learning, the fact is, each has its advantages for a specific type of application, or data, that is being deployed and in a specific environment. The data complexity and velocity determines how much processing is needed, while the environment typically determines the latency demands and the power budget.

CPUs, like Intel’s Xeon and Xeon Phi in the datacenter and the Qualcomm Snapdragon in mobile devices, do a great job for relatively simple data like text and jpeg images, once the neural network is trained, but they may struggle to handle high velocity and resolution data coming from devices like 4K video cameras or radar. To help address this, Intel has pre-announced a new version of their multi-core Xeon Phi, code named Knights Mill, which is expected to be available later this year. However in many cases, the job may require a GPU, an ASIC like Intel’s expected Nervana Engine or perhaps an FPGA programmed to meet the demands of a low latency and low power environment such as a vehicle or an autonomous drone or missile. While the NVIDIA GPU will win most drag races for the fastest solution (throughput), the FPGA (typically from Intel or Xilinx) affords the ability to reconfigure the hardware as acceleration algorithms evolve as well as provide very low latencies. In the cloud, we see a similar situation, where GPUs, FPGAs and ASICs like Google’s TPU (Tensorflow Processing Unit) each represent unique capabilities and cost / benefit advantages for specific data types and throughput requirements.

Some applications such as vision-guided autonomous systems require a hybrid hardware approach to meet the latency and data processing requirements of the application environment. While the accelerators mentioned above do a great job of running the AI inference engine, sensor fusion, data pre-processing and post-scoring policy execution requires a lot of special I/O and fast traditional logic best suited for CPUs. To solve this challenge, NVIDIA offers hybrid hardware platforms with an ARM / GPU combo in NVIDIA’s Jetson and DrivePX2, while Intel and Xilinx offer SoCs that marry ARM and FPGAs into a single, elegant low-power package. All of these products are finding their way into drones, factory robots / cobots and automobiles where the right combination of speed, flexibility and low power demand innovative approaches.

Not to be outdone, Qualcomm has been busy beefing up their Snapdragon processor to include a variety of accelerator technologies to support Machine Learning in mobile and other edge devices that will comprise a smart Internet of Things (IoT). In fact, the most recent Snapdragon 835 includes a CPU with a GPU and a DSP (digital signal processor) to meet a variety of programming and hardware models being used to speed the Machine Learning algorithm.

As you can tell, one size does not fit all in meeting the computational needs of the emerging Machine Learning application landscape. The result will be more choices for engineering / design teams and tailored solutions for the intelligent systems, products and services that are being built. For more detailed analysis, please see the recently published paper by Moor Insights & Strategy on this topic.