26 Jun ISC17: HPC Embraces Diversity As AMD & ARM Up The Ante Vs. Intel, IBM And NVIDIA

Renewed Competition in HPC Will Enable Workload Optimization

Intel has enjoyed a dominant position in High Performance Computing (HPC) processors for nearly a decade now, with only IBM  POWER offering a viable CPU alternative, since Advanced Micro Deviceseffectively, albeit perhaps unintentionally, exited the market in 2010. Meanwhile, NVIDIA has kept things interesting by delivering GPUs as computational accelerators for those workloads that can be sufficiently parallelized and whose kernels have smaller memory footprints that can fit in the relatively small 16-32 GB per GPU. More recently, Intel has reinforced its leadership in CPUs with its multi-core Xeon Phi (Knights Landing) to the forefront, especially for similar highly threaded workloads that demand more memory. Now, things are about to get even more interesting as AMD, Intel, IBM and ARM Holdings’ partners Cavium  and Qualcomm  ready new server SOCs, touting more cores, more throughput and more memory bandwidth, presumably at lower prices, to attract the HPC crowd and cloud service providers customers to their camps.

AMD rolled out a new chip with a new brand & logo for its 32-core server SOC. (Source: Advanced Micro Devices)

Don’t think about this evolution as just another inning of the old faster-and-cheaper ball game. Rather, each of these architectures (Intel Xeon and Xeon Phi, NVIDIA and AMD GPUs, AMD EPYC CPUs, ARM SOCs and IBM POWER CPUs) will attract specific workloads that are inherently aligned with each vendor’s chips. We’ve seen this game played out before, as SPARC, MIPS, POWER and Itanium all vied for position and market share, until they all fell by the wayside, with the notable exception of IBM POWER. The industry is about to become more diverse once again, and HPC users will need to balance potential performance with the cost of application optimization and portability in making their buying decisions. This renewed diversity will likely lead to significantly more competition and innovation in the HPC market.

What’s New In Processor Land?

Let’s start with the current leaders: Intel, IBM and NVIDIA. Each is in the early innings of a major product refresh, with Intel bringing out Knights Hill later this year to add (sorely missing) Deep Learning features to its Xeon Phi product and the new Purley-based Xeon Scalable Processor (Skylake) Family expected soon. While Skylake is perhaps a bit later than initially planned, by the end of this year Intel’s portfolio be stronger than ever. In fact, Xeon Skylake will more than double the HPC performance of its worthy predecessor, Broadwell, for floating point operations thanks to the addition of AVX512 vector processing, currently only available in the Knights Landing (KNL) Xeon Phi family. And the chip will sport Intel’s OmniPath, the on-die fast interconnect.

Meanwhile IBM is readying its highly anticipated OpenPOWER9 processor for deliveries later this year. This architecture has already been awarded two significant HPC projects by the US DOE: the Oakridge Sierra and the Lawrence Livermore Summit supercomputers. Both of these mammoth systems also use NVIDIA’s new VOLTA GPU and the Mellanox Infiniband interconnect, built into nodes consisting of a pair of POWER9 processors and six NVIDIA Volta GPUs. Scaling to 4600 nodes to hit its targeted 150 petaflops of peak performance, and perhaps reaching as high as 200 petaflops, these systems may vie for the coveted mantel of the fastest supercomputer in the world. Both IBM POWER9 and NVIDIA Volta are technology tours de force, being two of the largest and most complex silicon devices ever produced. The high level of GPU scaling also makes these systems ideal for conducting research in machine learning, with the NVIDIA Volta delivering up to 120 teraflops per GPU.

So, with that as a backdrop, AMD has announced its new EPYC x86 architecture with up to 32 cores and 64 threads per CPU. AMD has a rich heritage of HPC technology and relationships, which it hopes to exploit with this new feature-rich CPU. Strategically, AMD believes that the money is not in these huge “Capability” supercomputers mentioned above, but in the much larger market for general purpose “Capacity” HPC machines needed by industry and academia to conduct real work, where the amount of memory and I/O are often more important ,since they will be running many diverse workloads at the same time on a large shared infrastructure. While I expect Intel Skylake to race past EPYC for some HPC applications and benchmarks, thanks to the 512-bit vector processor, many applications may be a better fit for AMD’s design, which boasts 8 memory channels and 64 PCIe lanes for I/O and GPU connectivity.
And this brings us to the ARM camp, where I personally had a near death experience trying to build and market ARM-based CPUs at the now-defunct startup Calxeda. ARM server CPUs have been a big disappointment over the last 5 years, never fulfilling the promised performance and efficiency claims. That may be about to change, now that companies like Cavium are bringing out their 2nd generation of 64-bit CPUs built on 14nm technology for the datacenter. Qualcomm is expected to try to leapfrog the pack with its 10nm SOC later this year or early next. The Cavium team has quite a few Calxeda veterans in its midst, and they appear to have learned well from that experience. The upcoming ThunderX2 SOC was demonstrated at ISC17 in Frankfurt, running RedHat Linux, however the specs remain somewhat vague. Nonetheless, it appears that ARM is (still) right around the corner with interesting datacenter CPUs that will target the HPC market. European HPC customers like the Barcelona Supercomputing Center have shown a penchant for ARM, seeing this as an avenue for fostering European high tech investments.
Conclusions

With so many products just announced, but only about to begin initial production deliveries, it is impossible to pick a winner. Certainly, NVIDIA’s lead in GPUs, especially with Volta, will be a tough act for AMD to follow, while the AMD EPYC CPU shows a lot of promise. Note that the upcoming AMD Vega GPUs do not support native double-precision floating point and Error Correcting Memory, which is a deal breaker for most (but not all) HPC workloads. (In full disclosure, I was VP Marketing at AMD while EPYC was being designed.) While ARM has a lot to prove to make up for years of over-promising and under-delivering, the new batch of CPUs from Cavium and Qualcomm show promise. Meanwhile, Intel looks very strong, but it has not faced a credible x86 competitor in many years. IBM POWER9 has some very impressive specs and heritage, not to mention that it is the only CPU to offer native NVLINK as well as Open CAPI for tightly integrating CPUs with GPUs and FPGAs like Xilinx.

Suffice it to say, picking a processor for your HPC workload just got a lot more difficult, but the opportunity to fine tune your HPC CPU, GPU or FPGA to meet the needs of specific workloads and installation requirements can help lower costs and power, while increasing performance. And in HPC, that’s the name of the game!