2026-06-26 · 11 min read

Chase Cheap Power, Not GPUs: An AI Infra Siting Playbook for CTOs

By Diogo Hudson Dias

A hydroelectric dam in Brazil at dusk with transmission towers leading to a lit, modern data center building nearby.

In 2026, your biggest AI constraint isn’t GPUs. It’s electrons. Investors are funding power-first startups to feed AI data centers, ex–big-tech AI leaders are openly pushing for 10–1000x power efficiency, and grid interconnection queues in many US regions are measured in years, not quarters. If you treat siting as an afterthought, your AI cost curve will be wrong by an order of magnitude — or you’ll simply be boxed out of capacity.

This isn’t a philosophical point. Power price, availability, and cooling efficiency (PUE) now determine three non-negotiables: unit economics, capacity timing, and carbon disclosures that your enterprise customers increasingly require. The good news: you don’t need to become a data center developer. You need a siting strategy and a procurement checklist that translate into actual margin.

First principles: why siting now dominates AI economics

Power cost compounds with scale. A single H100-class GPU draws roughly 700 W at load. Multiplied by hundreds or thousands, and grossed up by PUE (often 1.2–1.6 outside hyperscalers), you are buying megawatts — even if you “just rent GPUs.” Providers pass that power cost through.
Power availability gates GPUs. Many regions cannot add new multi-megawatt loads quickly. If your provider can’t secure capacity, your reservations slip. This is why you should ask not just “Do you have H100s?” but “What substation and what interconnection queue are we riding?”
Latency only matters for what’s interactive. Most AI work is batch or delay-tolerant: embedding generation, fine-tuning, nightly index rebuilds, model distillation. Run those where power is cheap and cool. Keep only truly interactive inference near users.
Carbon is becoming a contract term. Enterprise RFPs increasingly require 24/7 carbon reporting, not annual offsets. Regions with clean grids (hydro, wind) simplify sales and compliance.

The math that changes decisions

Let’s put round numbers on it using typical published power draws:

H100/H200-class GPU: ~700 W TDP at load
L40S-class GPU: ~350 W at load
PUE: 1.2 (good hyperscale) to 1.6 (legacy colo)

Example A: a 512-GPU training cluster (H100-class), 70% average utilization, PUE 1.3.

IT load ≈ 512 × 0.7 × 700 W = 250 kW
Facility load ≈ 250 kW × 1.3 ≈ 325 kW
Monthly energy ≈ 325 kW × 720 h ≈ 234 MWh
At $0.05/kWh: ≈ $11.7k/month; at $0.15/kWh: ≈ $35k/month

Example B: 5,000 L40S for high-throughput inference at 40% utilization, PUE 1.2.

IT load ≈ 5,000 × 0.4 × 350 W = 700 kW
Facility load ≈ 700 kW × 1.2 = 840 kW
Monthly energy ≈ 840 kW × 720 h ≈ 605 MWh
At $0.05/kWh: ≈ $30k/month; at $0.20/kWh: ≈ $121k/month

Are these the majority of your all-in costs? No — hardware, rent, and platform margin often dwarf raw power. But at scale, the deltas are real dollars ($1M+/year), and more importantly, power availability and PUE drive whether your provider can even take your order next quarter.

A CTO’s siting framework: choose by workload, not by logo

Stop asking “AWS, GPU cloud, or colo?” and start asking “For this workload, what’s my tolerance on latency and data residency, and what mix of power price, PUE, and capacity wins?”

Segment your AI by latency and data gravity

Tier 1 – Interactive inference (sub-150 ms TTFB): Chat UI, code assist inline, search autocompletion. Must run close to users (or to the API your product calls). Aim for metro or at most one backbone hop away. Latency budgets die at 100+ ms of extra RTT. São Paulo–Miami round trip is often 110–140 ms; that alone blows your budget if your users are in Virginia.
Tier 2 – Near real-time (0.5–5 s): Post-processing, ranking reruns, lightweight function calling that can hide behind spinners or background jobs. Can travel 1–2 regions away if you pipeline correctly.
Tier 3 – Batch/tolerant (minutes to hours): Embedding generation, nightly index rebuilds, fine-tuning, model distillation, offline evaluation. Run wherever power is cheap and clean. Ship artifacts forward.

Pick a region pattern by tier

Tier 1: Choose low-latency metros with decent power cost. In the US that often means Northern Virginia, Ohio, Dallas, Phoenix. Negotiate for modern facilities (PUE ≤ 1.3). Avoid congested coastal zones with high $/kWh unless you absolutely need the eyeball proximity.
Tier 2: Push to cheaper power grids still on fast backbones: Midwest US, Quebec, Oregon, parts of Spain/Portugal for EU users. Many offer hydro/wind-dominant mixes and materially lower carbon intensity.
Tier 3: Chase cheap, clean power globally. Hydro-dominant provinces in Canada, wind corridors in the US interior, northern Sweden/Finland in the EU, and hydro/wind in Brazil. Brazil’s grid is majority renewable (hydro + wind + growing solar), which helps carbon accounting; actual contracted rates vary but can be highly competitive for steady, high-load buyers.

Nearshore reality check: Brazil’s role in your mix

Brazil is not where you host a US-facing chat endpoint. It is where you can sensibly run Tier 3 (and some Tier 2) workloads if you want price and carbon advantages with nearshore time-zone overlap.

Power and carbon: Brazil’s electricity mix is predominantly renewable, anchored by hydro and wind. That often means lower embodied carbon per kWh than the US average, which is useful in enterprise audits that now look beyond offsets.
Latency math: São Paulo ↔ Miami RTT is typically 110–140 ms. That’s fine for batch and background work. Not fine for interactive token streams to US users.
Team overlap: 6–8 hours of workday overlap with US time zones simplifies ops and incident response compared to far-shore options.
Trade-offs: Hardware import duties and logistics can be painful if you’re buying your own racks; partner selection matters. GPU clouds inside Brazil remain capacity-limited compared to North America, but that is changing. Treat Brazil as a batch anchor, not your only region.

Three pragmatic architectures we actually recommend

1) Split-plane AI: keep the hot loop local, push the heavy lifting to cheap power

What: Tier 1 inference in one or two US metros closest to your users. Tier 2/3 in a low-cost, low-carbon region (e.g., Quebec, interior US wind, or Brazil).
How: Version your models and embedding indices. Promote artifacts via a registry (OCI-backed works well). Use CDC to move user-consented content required for batch jobs; keep high-risk PII pinned to its origin region when possible.
Why: You get predictable latency, cheaper bulk compute, and cleaner carbon numbers you can actually show on a slide.

2) Neocloud sandwich: hyperscaler edges + specialist GPU regions

What: Frontend and latency-critical microservices on your existing hyperscaler. GPU-heavy jobs placed with a specialist provider who proves power price and PUE advantages.
How: Connect via private links or dedicated VPNs; terminate data contracts and DPA for cross-border flow. Ask the GPU provider for substation/feeder details and PUE/WUE attestation, not just “we have H100s.”
Why: You buy flexibility and faster capacity ramps without moving your whole stack. This is the fastest path to real savings that most Series B–D teams can execute.

3) Own a slice: reserved racks where power is right

What: Commit to a small number of reserved racks (one to four) in a facility with verifiably low $/kWh and PUE. Populate with the GPU mix your workloads actually need (often L40S/A100-class for inference, H100-class for specific training).
How: Work with a partner who handles procurement, import/export, and smart RMA logistics. Demand transparent power billing — you want a kWh line item, not a blended guess.
Why: If your AI load is stable, the spread between retail GPU rentals and owning/long-leasing often exceeds power and ops costs by a wide margin. You also stop getting whiplashed by region-level capacity crunches.

The procurement checklist that actually filters providers

When you issue your RFP or grill a vendor, ask questions that reveal their power position and operational truth. If they can’t answer, they don’t run at meaningful scale.

Power price and structure: What is your effective $/kWh to us, and how is it indexed? Is it a pass-through from a PPA or utility tariff? Any seasonal or demand charges we should model?
PUE and WUE: What is your trailing 12-month PUE and water usage, by facility? Provide third-party attestation or telemetry screenshots. We want ≤ 1.3 PUE for new builds.
Capacity and interconnection: Which substation feeds you? What is the available headroom today and on your 12–18 month roadmap? Any active interconnection queue IDs?
Carbon accounting: Hourly location-based carbon intensity (gCO2/kWh) and any 24/7 carbon-free energy matching. Annual offsets don’t count as clean for RFPs that ask.
Latency disclosures: Round-trip latency from your facility to major user metros we care about. Show traceroutes, not marketing maps.
Thermal strategy: Air vs. liquid cooling now and in 12 months. Hot aisle containment? Liquid-ready? Can you densify without moving cages?
Failure domains: How are power feeds, chillers, and network paths isolated? Show us at least N+1 on power and cooling.
Contract outs: Explicit ability to ramp down or move regions if latency or PUE targets are missed. Audit rights on power invoices if we’re on pass-through.

Budgeting the truth: where the money really goes

For most startups renting GPUs month-to-month, raw power is a minority of the invoice. But that doesn’t make siting a rounding error. It leaks into price in three ways:

Provider margin: Expensive regions force higher pricing or tighter quotas. The same GPU hour in a hydro-powered region is often 10–30% cheaper net to you once you negotiate.
Capacity timing: If your vendor can’t secure megawatts, you can’t scale. “We’ll have H200s in Q1” means nothing if their substation is tapped.
Carbon in the sales cycle: If you sell into enterprises or public sector, clean-region claims shorten security and sustainability reviews. That saves real calendar time.

Security and data governance: the blockers you can actually clear

PII boundaries: Keep high-risk PII and regulated data resident in-region. For batch AI, move only the minimum required fields (tokenized or masked). Use object-level access controls and data contracts you can show an auditor.
Cross-border contracts: Update DPAs to name your batch regions and processors explicitly. For Brazil (LGPD), EU (GDPR), and US state privacy laws, document what flows and why.
Provenance and artifacts: Version models, embeddings, and data snapshots. Provenance makes cross-region promotion reviewable and reversible.

30/60/90-day plan to de-risk and save

Day 0–30: Instrument and model

Measure kWh per 1,000 tokens for your main inference paths (provider can estimate if you can’t). Track utilization and p95/p99 latency.
Classify workloads by the three tiers. Be ruthless — most “real-time” jobs are not.
Build a simple TCO sheet: GPU $/hr, assumed utilization, PUE, $/kWh, and latency penalty if remote. Sanity-check against provider quotes.

Day 31–60: Prove the split

Pilot Tier 3 in a cheap, clean region (Quebec, Midwest wind, or Brazil with a vetted partner). Move one embedding pipeline or a nightly index rebuild.
Instrument end-to-end. Validate artifact promotion, data masking, and incident runbooks across time zones.
Run a failure game day: cut the remote region, confirm local fallbacks keep SLAs intact.

Day 61–90: Lock in capacity and SLAs

Negotiate 6–12 month capacity in the winning cheap-power region. Bake PUE and latency targets into the MSA with outs.
Right-size Tier 1 capacity with realistic concurrency and caching. Overprovision less once your batch offload is real.
Publish carbon numbers in your security packet. It shortens enterprise cycles and aligns you with what boards increasingly expect.

What about “future chips will fix it”?

Yes, vendors are pursuing sub-1 nm nodes, new interposers, and inference-specific silicon. Efficiency will improve. But physics says two things stay true: moving tokens eats energy, and getting new megawatts connected takes time. The winning teams budget for power they can actually get next quarter and place the right workloads in the right regions now — then ride efficiency gains as upside.

Where DHD Tech fits

We design and operate split-plane AI backends for US startups with nearshore pods in Brazil. Practically, that means we take one expensive “always-on” AI job per quarter, move it to a cheaper, cleaner region without breaking latency or compliance, and leave you with dashboards that show both dollars and grams of CO2 saved. You don’t need to boil the ocean to see material gains — you need one well-chosen move.

Key Takeaways

Power price and availability now gate AI capacity and unit economics as much as GPU supply.
Segment workloads by latency: keep Tier 1 local; push Tier 2/3 to cheap, clean regions.
Demand PUE, carbon, and interconnection transparency from GPU providers — not just model names.
Brazil is a strong nearshore anchor for batch AI: renewable-heavy grid, cost advantages, and time-zone overlap.
Execute a 90-day split-plane pilot to bank savings without risking your SLAs.