The Infrastructure Trap
Meanwhile (part 2)
The electric grid gets $200B in federal funding: "AI datacenters need more capacity..."
Natural gas pipelines expand: "Peaker plants to handle compute demand spikes..."
Utilities build substations next to hyperscale campuses: "Dedicated 500MW feed for the new H100 cluster..."
No one asks why a datacenter drawing 100MW has GPUs sitting 60% idle.
TBF
The infrastructure cascade
Utility executives: "We need more generation capacity."
Grid operators: "We need more transmission lines."
Datacenter operators: "We need more power."
Reality: You're feeding an inefficient system more electricity.
Your GPUs aren't starving for power. They're starving for the right workload at the right time.
You're building a bigger highway to a parking lot full of idle containers.
TBF
The numbers no one connects
- $50B: Annual investment in datacenter power infrastructure (2024-2026)
- 60%: Average GPU utilization across hyperscale AI clusters
- 40%: Wasted electricity — chips drawing power while waiting for data
If schedulers knew where to route workloads, you wouldn't need half the infrastructure.
If orchestrators understood topology, you wouldn't need the next substation.
If drivers exposed CU-level state, you could fill the idle capacity you already paid for.
But instead:
- Utilities build more capacity
- Chip vendors build faster silicon
- Cloud providers buy more GPUs
- Power consumption doubles every 18 months
And no one asks why the chips that are already installed are sitting idle.
TBF
What this costs
Per hyperscale cluster (100,000 H100s):
- Power draw: 100MW continuous
- Annual electricity: $50M (at industrial rates)
- At 60% utilization: $20M/year wasted on idle silicon
Multiply across industry:
- Estimated 2M GPUs deployed globally (2025)
- ~40MW wasted capacity per 100K GPU cluster
- $400M/year in electricity powering idle compute units
And the solution everyone funds:
- Build more generation capacity
- Lay more transmission lines
- Install bigger cooling systems
Not:
- Fix the scheduler so the GPUs aren't idle
TBF
The equation every utility engineer knows
When AI datacenters call for new service, the first question is always:
P = V × I × PF × √3
[the problem no one talks about]
The invisible dependency
Natural gas futures spike when OpenAI announces a new training run.
Grid operators schedule maintenance around hyperscaler launch windows.
Utilities treat AI datacenters like they're weather events — unpredictable demand spikes that require reserve capacity.
Because no one can predict which GPUs will be busy and when.
Your infrastructure team is building reserve capacity for inefficiency — not for growth, but for not knowing where workloads will land.
TBF
From 120,000 feet
The grid doesn't need more capacity.
The schedulers need to stop treating 128 compute units as "one device" and randomly slamming workloads into whichever device has RAM available.
The orchestrators need to understand which CUs are idle, which are thrashing, and route accordingly.
The telemetry needs to expose intra-device topology, not aggregate metrics.
Fix the logistics, and half your power infrastructure funding disappears.
But that would require asking the question no one wants to ask:
What if we're funding the wrong layer?
TBF