This blog series documents a middle-out model (vs. top-down and bottoms-up) for measuring the impact of parallel computing choices. I built it because companies needed to comprehend enough detail to generate useful differentiation and make competitive comparisons between systems, without getting mired in subtle details and complex low-level interactions. In this series I work my way up to simpler generalizations, such as Amdahl’s Law. Eventually I will take deeper dives toward the bottom to illuminate critical architectural differentiation and design choices between systems and architectures.
As the IT industry moves into it’s next phase of growth, SoC (system on chip) and server rack-level system designers are being asked to tune the performance and power management of their hardware products for specific usage models and datacenter workloads. They are doing this by adding parallelism (for a variety of well-documented reasons). At the same time we are challenging software developers to write more parallel code to run on these parallel resources, and the low-level software development tools available to do this are insufficient.
The challenge to our industry today is that people, including most programmers, think serially. Most mortal humans cannot visualize parallelism, and so this problem cannot be solved by better education or by promoting the tools currently available.
We need better system-level performance models to make better decisions regarding which parallel hardware architectures and software programming techniques should be used with specific classes of applications and workloads. To build those new models, we need to break with the past and think differently about system performance.
The state of our industry today is that truly serial runtime code is very rare. Virtually all runtime code is parallel to some extent. Applications processors are already multi-core – they are moving from dual- to quad-core and soon beyond. At the same time, they are also integrating interesting compute offload hardware – GPUs, DSPs, fixed function blocks, etc. And it is reasonable to expect that mobile derivative multi-core SoCs with compute offload capabilities will draft ARM cores into the server market as system vendors adopt “smartphone” processors for server workloads.
Gene Amdahl’s eponymous equation is a general, theoretical, top-down definition of the relative performance speedup that might be obtained by increasing the parallelism of a specific system. It is somewhat useful for describing the high-level effects of adding or subtracting more parallel processing resources to an existing system or tuning the balance of “serial” vs. parallel code. But it does not provide insight into the performance trade-offs of today’s heterogeneous architectures and says nothing at all about power consumption. In other words, Amdahl’s Law is more what you’d call a guideline than an actual law.
Amdahl’s Law Part 1: What Does Parallelism Mean?
Amdahl’s Law Part 2: Is Amdahl’s Law Still Relevant?
Amdahl’s Law Part 3: Modeling Execution Time and Power Consumption
Amdahl’s Law Part 4: Parallel Performance – Execution Time Detail
Amdahl’s Law Part 5: Parallel Performance – Power Consumption Detail
Amdahl’s Law Part 6: How to Measure and Compare Parallel Systems