13 Apr Google’s TPU For AI Is Really Fast, But Does It Matter?
After nearly a year since the introduction of the Google TensorFlow Processing Unit, or TPU, Google has finally released detailed performance and power metrics for its in-house AI chip. The chip is impressive on many fronts, however Google understandably has no plans to sell it to its competitors, so its impact on the industry is debatable. So, who really benefits, and who is potentially exposed to incremental risk, by this ninja chip for AI? I think the answer is everyone, and no one, respectively. Here’s why.
What is a TPU and how does it stack up?
The challenge Google was facing a few years ago was that it foresaw a dramatic shift in its computing needs towards supporting Machine Learning workloads. These applications are profoundly compute intensive, and continuing to use (Intel) CPUs was cost prohibitive and would not meet its needs for rapid response times across millions of simultaneous users and queries. Google was using NVIDIA GPUs for training the underlying neural networks that allow machines to recognize patterns in the data and using x86 CPUs to then execute the queries across the neural network, called inferencing. While large GPUs for training are fairly expensive, the larger volume of work would be in these inference engines. So, Google decided to develop a chip that could handle this workload at a lower cost, with higher performance, while consuming far less power.
Google’s TPU sits on a PCIe Card and fits in a standard disk drive bay. You can have multiple TPUs per server.
Google has recently released extensive architectural details and performance data that show the fruits of its labor. Understandably, it compared the TPU with the generation of NVIDIA and Intel chips that it had at its facility at the time; Intel’s Haswell is 3 generations old and the NVIDIA Kepler was architected in 2009, long before anyone was using GPUs for machine learning. Now NVIDIA CEO Jensen Huang has been kind enough to provide updated comparisons to NVIDIA’s latest generation of chips, based on NVIDIA PASCAL. Comparing current generation chips makes a huge difference, as NVIDIA’s deficit of yielding only 1/13th the performance of the TPU turns into a 2X advantage for NVIDIA, albeit at 3x the power consumption.
These two approaches produce very different results. The P40 has strong floating point, useful in training, and greater memory bandwidth. The TPU screams at 90 trillion operations per second, nearly twice that of the GPU, and consumes only 1/3rd the power. Keep in mind that the GPU being measured is just one instantiation of the PASCAL architecture; NVIDIA is able to productize a single architecture to address many distinct markets, including gaming, Machine Learning (ML training and inference), automotive and supercomputing. The GPU is a programmable device and as such is a general-purpose accelerator. The TPU, on the other hand, is designed to done one thing extremely well: multiply tensors (integer matrices) in parallel that are used to represent the (deep) neural networks used in Machine Learning for AI.
So, who benefits from the TPU, and who might be hurt by it? Users of Google Machine Learning services will directly benefit as more services move over to run on TPU; Google has lowered the price of selected services by as much as 6x, directly attributing the savings to the TPU. So, Google wins by having a more competitive platform for internal use and cloud ML services and by saving on its CAPEX and power consumption for its massive datacenters.
Does the TPU represent a risk to silicon vendors such as Intel and NVIDIA? I think not, at least not directly and not immediately. First, most inference work today is done by Intel Xeon CPUs in the datacenter and ARM CPUs at the edge and is deployed at a more modest scale than seen at Google. And Google is still using NVIDIA GPUs for training its neural networks. So it is not like the TPU took a big chunk out of NVIDIA’s business, if any. Intel wouldn’t have been able to deliver the performance Google needed, so this is a case of giving up sleeves out of its vest. (Note that the TPU is still an accelerator hanging off an Intel Xeon server.)
Second, consider that the TPU is only available to Google’s internal data scientists and to users of Google’s AI cloud services. Google Cloud remains a distant third to Amazon Web Services and Microsoft Azure, both of whom offer NVIDIA GPUs in their cloud services for Machine Learning applications. Looking ahead, I would not be surprised to see Google develop a training chip at some point to realize further cost saving for its growing AI portfolio. But again, that would only impact Google’s purchases for its own use, not the purchases by the other 6 of the world’s largest datacenters (Amazon, Alibaba, Baidu Facebook, Microsoft and TenCent). These guys will all continue to purchase GPUs and FPGAs for their acceleration workloads, until and unless a better alternative comes along.
Given the rapid market growth and thirst for more performance, I think that is inevitable that silicon vendors will introduce chips designed exclusively for Machine Learning. Intel, for example, is readying the Nervana Engine technology they acquired last August, most likely for both training and inference. And I know of least four startups, including Wave Computing, NuCore, GraphCore and Cerebras that are likely to be developing customized silicon and even systems for Machine Learning acceleration. Certainly, more competition and alternatives in this space will fuel more adoption and innovation, which benefits everyone in the market.
As for the market leader, NVIDIA won’t likely be left in the dust. NVIDIA can also incorporate new techniques in its hardware specifically for Machine Learning, and it can continue to optimize its software ecosystem to keep pace. Just last year, NVIDIA set the new standard for reduced precision matrix operations for 16-bit floating point and 8-bit integer values (for training and inference, respectively). All other silicon vendors, with the notable exception of Xilinx, are at least a year behind NVIDIA in adopting this approach, which can double or quadruple performance and power efficiency. Finally, NVIDIA’s NVLINK interconnect is still the only viable contender to support strong scaling of cooperating accelerators. (IBM OpenCAPI is the sole alternative, and even IBM supports both.)
Google is a world leader in developing and using Machine Learning algorithms and hardware in its vast Internet search operations and cloud service offerings. It uses it for everything from Google Translate, which supports over 100 languages, to Google Now, to building an AI that beat the world champion at GO. So it makes sense that it would want to invest in customized hardware that can deliver the most performance for its software. The performance and architectural details it has recently shared demonstrates its prowess in designing ASICs to accelerate machine learning, and it is likely that its TPU presages other designs that will further challenge the status quo. I am certain that the other large internet datacenters will do the math to evaluate the ROI of similar efforts for their own use, but for now I suspect they may not currently have the scale required to justify the development investment of perhaps $100M a year. But you can be sure that the machine learning and AI market is still in its infancy, and we will see many innovations in hardware and software in the coming years.