23 Jan 2019: A Cambrian Explosion In Deep Learning, Part 1
I started out writing a single blog on the coming year’s expected AI chips, and how NVIDIA might respond to the challenges, but I quickly realized it was going to be much longer than expected. Since there is so much ground to cover, I’ve decided to structure this as three hopefully more consumable articles. I’ve included links to previous missives for those wanting to dig a little deeper.
- Part 1: Introduction and the large players trying to attack NVIDIA: Intel , AMD , Google , Xilinx , Apple , and Qualcomm
- Part 2: Startups and China Inc. and the roles each may play
- Part 3: Potential NVIDIA strategies to fend off would-be challengers
In the last five years, NVIDIA grew its data center business into a multi-billion-dollar juggernaut without once facing a single credible competitor. This is an amazing fact, and one that is unparalleled in today’s technology world, to my recollection. Most of this meteoric growth was driven by demand for fast GPU chips for Artificial Intelligence (AI) and High-Performance Computing (HPC). NVIDIA’s CEO, Jensen Huang, likes to talk about the “Cambrian Explosion” in deep learning, referring specifically to the rapid pace of innovation in neural network algorithms. We will touch on what this means for NVIDIA in Part 3, but I chose to borrow the concept for the title of this series. We are at the doorstop of an explosion in specialized AI silicon, from many large and small companies around the world. Three years ago, it was next to impossible to get venture funding for a silicon startup. Now, there are dozens of well-funded challengers building chips for AI.
Last year, NVIDIA and IBM reached the pinnacle of computing with the announcement they were powering the world’s fastest super computer, ORNL’s Summit (which owes some 95% of its performance to NVIDIA’s Volta GPUs). While this is an incredible accomplishment, many are beginning to wonder if this whole fairy tale can last for NVIDIA.
In the latest reported quarter, NVIDIA data center revenue grew by 58% year-over-year to $792M, nearly 25% of the company’s total revenues. This amounts to a total of $2.86B over the last 4 quarters. If the company can maintain that growth, it could generate some $4.5B in data center revenue in 2019. Sounds like heaven, or at least heaven on earth, right?
Without a doubt, NVIDIA builds great products driven by its powerful vision of one extensible architecture. NVIDIA now enjoys a robust and self-sustaining ecosystem of software, universities, startups, and partners that have enabled it to become the master of its own newly created universe. While some would argue that this ecosystem creates an impenetrable defensive moat, storm clouds are now appearing on the horizon. Potential threats are coming from Intel, Google, AMD and scores of US and Chinese startups, all drawn into the feeding frenzy of AI.
So far, in my opinion, the competition has mostly been smoke with very little fire. Dozens of announcements have been made by competitors, but I am pretty confident that none of them have actually taken any revenue from NVIDIA’s coffers, outside of Google. Let’s look at the competitive landscape as it currently stands, looking towards what’s shaping up to be a very interesting 2019.
The large challengers
While the New York Times counted over 40 startups entering this space, let’s be realistic: there is only room for a handful of companies to be truly successful in this market (say revenues greater than $1B). For training Deep Neural Networks (DNNs), NVIDIA will be very hard to beat, given the strength of its products, its installed base, and its pervasive ecosystem. However, the inference market, which is currently quite small, will eventually exceed the training market in total revenue. Unlike training, inference is not a monolithic market. It is composed of a myriad of data types and associated optimized deep learning algorithms in the cloud and at the edge, each with specific performance, power, and latency requirements. Additionally, there isn’t an 800-pound incumbent gorilla in inference—even in the automotive market where NVIDIA has laid claim to leadership. For these reasons, inference is where most of the new entrants will primarily or initially focus. Let’s look at the large players vying for a place at the table.
One of the first companies to demonstrate that a specialized chip (known as an ASIC, or Application Specific Integrated Circuit) can counter the more programmable and general-purpose (I can’t believe I just said that!) GPU for Deep Learning was Google—who, coincidentally, is probably one of NVIDIA’s largest customers. As I have previously covered, Google has now released four “Tensor Processing Units” (TPUs)—chips and boards that accelerate deep learning training and inference processing in the cloud and, more recently, at the edge. The performance of a Google TPU for training and processing a DNN is pretty solid, delivering up to 45 trillion operations per second, or TOPS, per chip. This compares to NVIDIA’s Volta, which peaks out at 125 TOPS. The first couple of TPUs were really for internal use and bragging rights, but Google now makes them available as a service to its cloud customers on Google Compute Cloud.
While TPUs have certainly put a kick into Google’s AI step, the market they serve outside of Google’s internal use cases (which, granted, is a quite large market) is intentionally restricted. TPUs can only be used for training and running the Google TensorFlow AI framework; you cannot use them to train or run AI built with Apache MxNet or PyTorch, the fast-growing AI framework supported by Facebook and Microsoft . Nor can you use them for non-AI HPC applications, where GPUs reign supreme. Additionally, you cannot buy TPUs for on-premises computing in corporate or government data centers and servers. But Google’s ok with all that, since it views TPUs and TensorFlow as strategic to the its AI leadership across the board. Software that is optimized for its hardware that is optimized for its software can make for a powerful and durable platform.
The more immediate impact of TPU may be to act as validating the ASIC concept as an alternative to a GPU, at least for potential investors. The CEO of a Deep Learning chip startup shared with me that venture capital began flowing freely once Google announced its TPU. He has subsequently raised hundreds of millions of dollars.
Google has been adept at stealing some limelight from NVIDIA’s predictable announcements at the GPU Technology Conference (usually in March) and I would not be surprised to see the company at it again this year—perhaps with a 7nm TPU product with impressive performance numbers.
Not to be outdone, Amazon Web Services – +0% announced last Fall that it, too, was building a custom ASIC for inference processing. However, the chip is still in development and the company did not share any details on the design or availability.
This gets a little more complicated since Intel is such a large player and has at least one iron in every fire. While the company intends to compete for AI training and inference with Nervana chips in “late 2019,” it realizes that inference will become a larger market, and has a very strong hand to play. In addition to Xeon CPUs (which were recently updated with significantly improved inference performance), the company acquired MobileEye and Movidius, for Automotive and embedded inference processing respectively. I have seen demos of both devices, and they are indeed impressive. Intel has also invested in a run-anywhere software stack, called OpenVino, which allows developers to train anywhere and then optimize and run on any Intel processor. Smart.
In a revelation at CES in Las Vegas, Intel disclosed that it is working closely with Facebook on the inference version of the Nervana Neural Network Processor (NNP-I)—surprising because many had predicted that Facebook was working on its own inference accelerator. Meanwhile, Naveen Rao, Intel’s VP and GM of AI products, shared on Twitter that the NNP-I will be an SOC (System-On-a-Chip), built in Intel’s 10nm Fab, and will include IceLake x86 cores. Mr. Rao indicated this would be a common theme in the future for Intel, perhaps a reference to future X86/GPUs for desktop and laptop chips akin to AMD’s APUs.
For training, Intel’s original plan was to announce a “Lake Crest” Nervana NNP in 2017, a year after the Nervana acquisition. Then it slipped to 2018, and then, well, the company decided to start over. This was likely not because the first Nervana part wasn’t any good; rather, the company realized the device just wasn’t good enough to substantially out-perform NVIDIA and the TensorCores it added to Volta and subsequent GPUs. We will see this movie play out again, I suspect, when NVIDIA unveils whatever surprises it is cooking up for its 7nm part—but I’m getting ahead of myself.
Qualcomm and Apple
I include these two companies for the sake of completeness, as both are delivering impressive AI capabilities focused on mobile handsets (and, in Qualcomm’s case, IOT devices and autonomous vehicles). Apple, of course, focuses on its A series CPUs for iPhones and the IOS operating system support for in-phone AI. As mobile becomes a dominant platform for AI inference in speech and image processing, these two players have a lot of IP they can use to establish leadership (although Huawei is also pushing very hard on AI, as we will cover in Part 2).
AMD has been hard at work for the last three years getting its software house for AI in working order. When I worked there in 2015, you couldn’t even run its GPUs on a Linux server without booting Windows. The company has come a long way since then, with ROCm software and compilers to simplify migration from CUDA, and MlOpen (not to be confused with OpenML) to accelerate math libraries on its chips. Currently, however, AMD’s GPUs remain at least a generation behind NVIDIA V100s for AI, and the V100 is approaching two years old. It remains to be seen how well AMD can compete with NVIDIA TensorCores on 7nm. AMD may decide to focus more on the larger inference market, perhaps with a semi-custom silicon platform for autonomous vehicles, akin to the NVIDIA Xavier SOC. Time will tell.
Make no mistake, Xilinx , the leading vendor of programmable logic devices (FPGAs), had a fantastic 2018. In addition to announcing its next generation architecture for 7nm, it scored significant design wins at Microsoft, Baidu , Amazon, Alibaba, Daimler Benz, and others. In AI inference processing, FPGAs have a distinct advantage over ASICs because they can be reconfigured on the fly for a specific job at hand. This matters a lot when the underlying technology is changing rapidly, as is the case for AI. Microsoft, for example, showed off how its FPGAs (now from Xilinx as well as Intel) can use 1-bit, 3-bit, or practically any precision math for specific layers in a deep neural network. This may so und like a nerdy nit, but this can dramatically speed processing and reduce latencies, all while using far less power. Additionally, the upcoming 7nm chip from Xilinx, called Versal, has AI and DSP engines to speed up application-specific processing alongside the adaptable logic arrays. Versal will start shipping sometime this year, and I think it could be a game changer for inference processing.
In the second blog of this three-part series, I will explore a few of the startups in the west and in China that are lining up to play an important role in the world of AI hardware. Thanks for reading, and stay tuned!