22 May Google Cloud TPU: Strategic Implications For Google, NVIDIA And The Machine Learning Industry

Google  announced the 2nd generation of the company’s TensorFlow Processing Unit (TPU), now called the Cloud TPU, at the annual Google I/O event, wowing the industry with performance for Machine Learning that appeared to eclipse NVIDIA’s Tesla Volta GPU only one week after that chip was launched. (See below why I say “appeared”.) Unlike Google’s 1st TPU ASIC (Application Specific Integrated Circuit), this new chip is designed to support the training of neural networks for AI, as well as the use of trained networks, or inference.

Tactically, this chip should provide significant cost savings for Google, widely believed to be the largest consumer of Machine Learning chips in the world. Strategically, it provides a computation platform tailored to enable the company’s AI-centric global businesses. I apologize in advance for the length of this article, but this technology has far-reaching implications.

What Did Google Announce?

Google announced a new ASIC that will accelerate its internal machine learning algorithms, as well as provide a compelling platform for AI practitioners to use the Google Cloud for their research, development and production AI work. The 2ndgeneration TPOU chip delivers 45 Trillion Floating Point Operations Per Second (presumably 16-bit TFLOPS) for Machine Learning, roughly twice that which is available today from NVIDIA P100 (20 TFLOPS) or Advanced Micro Devices’ upcoming Vega GPU (25 TFLOPS), however it will be surpasses by NVIDIA’s new Volta chip described below. The “Cloud TPU” is packaged on a 4-chip module complete with a fabric to interconnect these powerful processors, allowing very high levels of scaling. This scaling capability is important, because training a neural network can take advantage of an almost limitless supply of accelerators.

The 4 chip Cloud TPU board forms the building block node for interconnecting 1000s of TPUs in a cluster for research and cloud services. There were no visible signs of active cooling, and the company did not disclose power consumption details. (Source: Google)

The 4-chip Cloud TPU, therefore delivers 180 TFLOPS, and were shown in a “TPU Pod” with 32 interconnected boards, delivering 11.5 TeraFlops of peak performance—effectively a large supercomputer in a single rack.

Two TPU “Pods” in the center, with server racks on the side, deliver 11.5 TFLOPS each for Machine Learning. (Source: Google)

Google also announced the TensorFlow Research Cloud, a 1,000-TPU (4,000 Cloud TPU Chip) supercomputer delivering 180 PetaFlops (one thousand trillion, or one quadrillion, presumably 16-bit FLOPS) of compute power, available free to qualified research teams. While this is similar but significantly larger in concept to the Saturn V Supercomputer from NVIDIA, the Google Supercomputer is designed to support only Google’s own open-source TensorFlow Machine Learning framework and ecosystem, while Saturn V is available for all types of software.

While Google did announce jaw-dropping benchmarks, I believe the strategic implications for Google, the industry and NVIDIA are far more interesting.

Why is the Google TPU Strategic?

1. The cloud TPU demonstrates the potential for ASICs in Machine Learning

The first strategic implication for the industry is that Google has now demonstrated that an ASIC can deliver dramatic ML performance when placed in the hands of talented designers. And Google is making that technology available externally to accelerate the industry. The ML industry has an apparently insatiable appetite for performance, and this chip is very fast and scalable. This apparent success is also an important point to consider when we look to the upcoming launches of other ML ASICs, including Intel’s Nervana Engine, Wave Computing’s Dataflow Processing Unit, NVIDIA’s own DLA (more on that later) and others. Essentially, Google has built a chip that does one thing extremely well, focusing all the logic on the die to the math underlying the training and processing of neural networks. And by including a custom fabric to interconnect thousands of these chips together, Google can use and offer supercomputer-level performance at a fraction of the price of buying systems on the open market.

But building an ASIC isn’t for the faint of heart. First, an ASIC has limited functionality. For example, the Google TPU only supports TensorFlow, leaving the users of the other major AI frameworks, championed by Microsoft, Amazon, Facebook, etc., to run on GPUs. (Therefore, Google has also announced that it will offer the new NVIDIA V100 GPU’s in the Google Cloud.) ASICs are also expensive to develop; a complex design can cost in excess of $100M and take years to design and debug. (Google claims its effort was far smaller.)

So an ASIC like the TPU only makes sense if you can amortize those costs over significant volume, typically tens or hundreds of thousands of units. But Google needs that kind of volume, so the payoff could be swift and dramatic. Note that, because it can take so long to develop an ASIC, a chip’s design may miss the window of recent innovations in a fast-moving market like AI. This is why many datacenters, including Baidu, Amazon and Microsoft, prefer to accelerate key workloads with GPUs as well as FPGAs from Intel and Xilinx, which deliver high performance and power efficiency while retaining the flexibility to evolve the hardware on the fly as needed. And a talented design team can build a new FPGA in months, not years.

2. Google now controls a complete platform for its AI-centric strategy

Google’s CEO, Sundar Pichar, has made it clear that the company’s strategy has transitioned from “Mobile First” to “AI First”. Google’s Cloud TPU is far more strategic than just having access to a cheaper alternative to GPUs. The TPU and the Google TensorFlow Framework to give the company’s engineers and data scientists a comprehensive and optimized platform to support their research and product development. Google teams can potentially gain time to market, performance and feature advantages since they control both the hardware and software for their Machine Learning enhanced products and services. The TPU could even provide a future platform to support the company’s autonomous vehicle aspirations.

Beyond the internal drivers, Google Cloud could benefit in its competition with Amazon Web Services and Microsoft Azure Cloud by offering hardware with superior price / performance for TensorFlow development projects. And of course, TensorFlow itself could benefit as well; it is already the preferred framework used by many machine learning application teams. The TensorFlow Research Cloud is intended precisely for driving preference and adoption of TensorFlow across the industry.

3. Strategic impact on NVIDIA

Obviously, Google has been a very large (perhaps even largest) customer for NVIDIA GPUs for Machine Learning. Nearly all work done today to train neural networks at Google and elsewhere is being crunched today by NVIDIA GPUs. But NVIDIA is not standing still. Just last week, NVIDIA announced the upcoming Volta-based Tesla V100 GPU, which, among other innovations, includes a new type of processor cores that yields an amazing 120 Trillion Floating Point Operations Per Second (TFLOPS). While that is 2.6x higher than a single Cloud TPU chip, the Tesla V100 is a really big, and therefore really expensive, chip, measuring 815 mm2 and built with 21 billion transistors, 33% larger than Pascal P100.

In fact, CEO Jensen Huang bragged in his GTC keynote address that Volta is the largest and most complex chip ever attempted. But Volta has to be big: it is much more than a Deep Learning ASIC, supporting 32 and 64-bit floating point for scientific applications as well as error correcting memory required for massive supercomputers. IBM and NVIDIA won two of the three largest contemporary DOE Supercomputing contracts in large part because of the Volta chip. Oh, and lest we forget, it is almost certainly the world’s fastest graphics chip. But having all these features is precisely why an ASIC can potentially outperform a GPU.

Obviously, the Google Cloud TPU will have an impact on NVIDIA, but the magnitude may be less than you might think, at least in the short term, because of the very constraints I discussed above. The Cloud TPU will only be used internally at Google and in Google Cloud hosted Compute Engines. And it can only be used in projects that are using the TensorFlow Framework, and even then only in projects where running the application in a public cloud infrastructure makes security, business and regulatory sense since Google has no interest in becoming a merchant silicon vendor, especially to its competitors. Even if Google drove, say, 20% of NVIDIA’s GPU’s for Machine Learning, that business is growing nearly 3x per year, (186% in the latest quarter), so growth elsewhere will somewhat offset the losses at Google.

Nonetheless, as I said above, the TPU will soon be joined by other Machine Learning ASICs, which will not likely share the self-imposed limitations of the Google TPU. This is where NVIDIA’s own “TPU-like” Deep Learning Accelerator (DLA) may come into play. As I discussed in a recent article on the topic, NVIDIA has announced it will open source the accelerator technology used in its next generation autonomous driving SOC, code-named “Xavier”. NVIDIA’s move implies that NVIDIA could respond to any competitive threats that ASICs might present, should ASICs become more pervasive and compelling in other application segments and by other customers. And NVIDIA can extend its technology ecosystem beyond markets that the company chooses to address directly.

Conclusions

Google is attempting to build a dominant position in Artificial Intelligence, from optimized search services, to Android capabilities, to autonomous vehicles. Having complete control of the required technology stack enables it to optimize its technology while lowering its CapEx compared to buying technology from the outside. The synergies of controlling the Cloud TPU and TensorFlow should give the company a strategic competitive advantage, while accelerating the underlying science of AI and the industry at large. The impact on NVIDIA may be real, but relatively contained, at least for now. The longer-term impact on the GPU for AI, which has built NVIDIA’s impressive growth engine, remains to be seen.