Skip to main content
00 Days
00 Hrs
00 Min
00 Sec

What Is a GPU? The Hardware at the Heart of the AI Boom

A GPU is a graphics processing unit.

The name is a relic of its origins. GPUs were designed to render graphics, to take the mathematical calculations required to display images on a screen and do them very, very fast. Game developers needed hardware that could compute the color of millions of pixels simultaneously, updating dozens of times per second. The chip manufacturers built it. And then, somewhat accidentally, the AI field discovered that the same hardware that was good at rendering graphics was extraordinarily good at training neural networks.

The reason comes down to a fundamental difference in how CPUs and GPUs are designed to work.

A CPU, the central processing unit that serves as the main brain of a conventional computer, is optimized for sequential tasks. It has a relatively small number of very powerful cores, typically between 8 and 64 in a modern consumer chip, each capable of handling complex, branching logic quickly. CPUs are good at tasks where each step depends on the result of the previous step, where decisions need to be made based on intermediate results, and where the work doesn't decompose cleanly into independent parallel operations. Most conventional software is written this way.

A GPU takes the opposite approach. Rather than a small number of powerful cores, it has thousands of simpler cores designed to do the same operation on many pieces of data simultaneously. A high-end NVIDIA GPU used for AI training might have tens of thousands of cores operating in parallel. Each individual core is less capable than a CPU core. The aggregate throughput of thousands of them working simultaneously on the same computation is vastly higher than a CPU can achieve on the same task.

Neural network training maps almost perfectly onto this architecture. The core operation in training a neural network is matrix multiplication: taking large arrays of numbers and performing the same arithmetic operations on them repeatedly. This is an embarrassingly parallel problem, meaning it decomposes naturally into many independent computations that can all run at the same time. A GPU can apply the same operation to thousands of elements of a matrix simultaneously, compressing what would take a CPU hours into minutes or seconds. The scale of modern AI training, with models containing hundreds of billions of parameters updated repeatedly over enormous datasets, would simply be impractical on CPU hardware.

NVIDIA became the dominant GPU manufacturer for AI through a combination of hardware capability and software ecosystem development. Their CUDA platform, a programming framework that allows developers to write code that runs directly on GPU hardware, gave researchers and engineers the tools to exploit GPU parallelism for general computation well before AI demand made it commercially critical. By the time deep learning took off in the early 2010s, NVIDIA had a head start in both hardware and software that competitors have spent years trying to close. The H100 and successor chips that power most frontier AI training today represent decades of compounding advantage in both silicon design and software tooling.

The economics of GPU access shape AI development in ways that matter for understanding the field. Training a frontier model requires thousands of GPUs running for weeks or months, at a cost that runs into tens or hundreds of millions of dollars. This concentrates frontier model development in a small number of organizations that can afford that infrastructure, either directly or through cloud providers. Inference, running a trained model to generate outputs, is cheaper than training but still GPU-intensive at scale. The cost of GPU compute is a meaningful constraint on which organizations can build and operate large AI systems, and on what those systems can economically do.

The geopolitical dimension of GPUs reflects their strategic importance. Advanced AI chips require sophisticated semiconductor manufacturing processes that only a handful of facilities in the world can produce. The US government has implemented export controls restricting the sale of high-end AI chips to certain countries, most notably China, on the grounds that the most capable AI hardware has direct national security implications. The resulting scramble to develop domestic chip manufacturing capability, and to find ways around export controls, has made GPU supply chains a live geopolitical issue in a way that general-purpose computing hardware never was.

Alternative AI hardware is an active area of development precisely because GPU dependence creates bottlenecks and costs that the industry wants to reduce. Google has developed its own AI accelerator chips, called TPUs (Tensor Processing Units), optimized specifically for the matrix operations neural networks require rather than for general GPU workloads. A number of startups are building specialized AI chips with different architectural approaches. The goal in each case is hardware better matched to the specific computational demands of AI than GPUs, which remain graphics processors adapted to a new use rather than chips designed from the ground up for machine learning.

For practitioners who don't work directly with infrastructure, the practical implication of GPU constraints is that AI capability is not free and not infinite. The compute available for training a model determines its size and the amount of data it can be trained on. The compute available for inference determines how quickly and cheaply it can respond to requests. Cost and latency tradeoffs in AI deployment are ultimately GPU economics, and understanding that the hardware underneath is both genuinely scarce and genuinely expensive provides useful context for why AI systems are priced, limited, and architected the way they are.