physics.md · proof artifact 01

Why GPUs are physics, not just linear algebra.

Matrix multiplication is the clean math story. The machine doing it is messier. Charges move. Transistors switch. Bits travel through memory hierarchies and across interconnects. Power gets delivered. Heat has to leave. If you ignore that physical machine, you will miss why real AI systems behave the way they do.

Back to physics.md Read the spec

job

Multiply

Take huge piles of numbers and produce outputs fast enough to train or run modern models.

mechanism

Switch

Real work happens through switching devices, charge movement, signaling, storage, and transport across physical hardware.

villain

Move data

Getting operands to the right place at the right time is often harder than doing the multiply-add itself.

constraint

Dump heat

Every watt spent moving and switching information has to be delivered and then removed as heat.

The belief shift

The math is not the machine. The machine is a choreography of switching, transport, synchronization, power delivery, and thermal limits. AI performance lives inside that choreography.

1. What job is the system trying to do?

For AI workloads

During training or inference, the system keeps doing a simple-looking thing at huge scale: multiply numbers, accumulate results, apply nonlinearities, move tensors, repeat. The software view says this is linear algebra. Fair. But incomplete.

Why GPUs exist

GPUs are built to perform vast amounts of similar work in parallel. They are good when the same kind of operation needs to happen many times over large arrays of data. That is why AI landed on them so hard.

2. What physical mechanism is doing the work?

switching

Transistors decide and route

At the bottom, semiconductor devices switch states and control current flow. Logic is embodied in physical switching behavior, not floating math symbols.

storage

Capacitance holds bits briefly

Registers, caches, SRAM, DRAM. Different storage layers trade speed, density, distance, and energy. That hierarchy shapes what the chip can feed into compute units.

transport

Interconnect carries the operands

Signals move through wires, traces, packages, memory channels, and board-level links. Every hop has latency, bandwidth limits, integrity issues, and energy cost.

3. Why memory movement becomes the main fight

Compute is useless without operands

A fused multiply-add unit can be extremely fast. But if data is late, the expensive compute hardware waits. So system designers obsess over locality, caching, tiling, batching, and reuse. They are fighting the cost of transport.

The hard part is feeding the machine

Real performance often depends less on the raw math capability and more on whether the system can keep the compute units supplied with data from nearby memory rather than far-away memory.

Best case: the needed values are already close, in a fast storage layer, so the machine stays busy.

Common case: values must be pulled from a farther, slower, more energy-expensive layer, so the compute engine waits.

System consequence: model architecture, kernel design, batching strategy, and hardware packaging all start orbiting memory traffic.

4. Why power and heat are not side notes

power delivery

The chip needs electrical reality, not just theoretical throughput

High-performance compute needs stable power at large scale. Delivering that power cleanly is part of the system design problem.

heat

Every operation becomes thermal management eventually

Switching and transport consume energy. Energy turns into heat. If heat cannot leave fast enough, clocks, density, packaging, and total sustained performance all get squeezed.

real-world result

The bottleneck moves up the stack

Now infrastructure matters too: rack power, cooling design, system packaging, and datacenter constraints become part of the AI story.

5. Why this matters for AI builders

It changes what you optimize

memory locality starts mattering more
model and kernel choices get evaluated against movement cost
interconnect and bandwidth become product constraints, not hardware trivia
power and thermal limits stop being someone else's problem

It changes what you notice

You stop asking only “how many FLOPS?” and start asking “where are the operands coming from, what is the memory path, how much energy does this transport cost, and what happens thermally when we sustain this workload?”

6. The compact mental model

The math is the request

The workload asks for linear algebra.

The hardware is the negotiation

The chip negotiates that request through switching, storage, transport, and timing.

Memory is the tax collector

Data movement keeps charging you, in latency, bandwidth pressure, and energy.

Heat is the final judge

Sustained performance only exists if the system can physically survive its own activity.

What to explore next

Next proof artifact

Why memory movement is often harder than the compute

This should zoom in on hierarchy, locality, bandwidth, latency, reuse, and why moving data can dominate total system cost.

Then widen the map

Why optics keeps showing up in modern systems

That expands the lens from electrical compute bottlenecks into modulation, interference, fiber, photonics, and sensing.