NVIDIA B200 Cloud Pricing in 2026: How to Compare Per-Hour GPU Costs

B200 rental prices vary widely across clouds, so the number that matters is not dollars per hour but dollars per million tokens once you factor in throughput and utilization.

Jun 23, 2026 · 3 min read

NVIDIA B200 Cloud Pricing in 2026: How to Compare Per-Hour GPU Costs

Key takeaways

B200 on-demand rates differ substantially between hyperscalers and specialist GPU clouds.
Per-hour price is the headline; cost per 1M tokens is the metric that decides margins.
Reserved and spot capacity can cut the hourly rate sharply versus on-demand.
Utilization is the hidden multiplier: an idle B200 still bills full price.
Convert GPU-hour cost to per-token cost before comparing against per-token API pricing.

Why is B200 pricing so different across providers?

Because providers are not selling the same thing. A hyperscaler bundles support, networking, compliance, and reliability into the hourly rate; a specialist GPU cloud often strips that down to compete on raw price. Add region, contract length, and on-demand versus reserved versus spot, and the same chip can carry very different sticker prices.

The honest approach is to normalize: same GPU count, same commitment level, same region, then look at the spread rather than any single quote.

What number should you actually compare?

Not dollars per hour. Dollars per million tokens. A cheap hourly rate on a GPU you cannot keep busy is more expensive than a higher rate at full utilization.

The bridge is throughput. If a B200 instance sustains T tokens per second for your model and batch settings, it produces T x 3,600 tokens per hour. Divide the hourly rate by that volume and you get cost per token. The provider with the lower hourly price does not always win once throughput and utilization differ.

How do reserved and spot pricing change the picture?

A lot. On-demand is the most expensive way to rent a B200. Committing to reserved capacity typically lowers the effective hourly rate in exchange for a term, and spot or preemptible instances can be cheaper still in exchange for interruption risk. For steady inference workloads, reserved capacity usually wins; for bursty or batch jobs, spot can be the cheapest route if your system tolerates interruptions.

When does renting a B200 beat per-token API pricing?

When you have enough sustained, predictable volume to keep the GPU busy. Self-serving on rented B200s turns a per-token price into a fixed hourly cost, so the per-token math only beats an API once utilization is high. Below that, a managed per-token API is usually cheaper and simpler. Above it, owned throughput can undercut API pricing, but you absorb utilization and ops risk.

The takeaway: collect per-hour rates, then convert every one of them to cost per million tokens at realistic utilization before you choose. You can model GPU-hour cost, throughput, and per-token economics side by side in Calcaas.

Frequently asked questions

How much does it cost to rent an NVIDIA B200 in the cloud?

It varies widely by provider, region, and commitment, with on-demand being the most expensive and reserved or spot capacity considerably cheaper. Rather than rely on one headline rate, compare normalized quotes for the same GPU count and term.

Is per-hour GPU price the right way to compare providers?

No. Per-hour price is only the input. The decision metric is cost per million tokens, which depends on throughput and utilization. A higher hourly rate at full utilization can be cheaper per token than a low rate on an idle GPU.

Should I rent B200s or use a per-token API?

Rent when you have sustained, high utilization that keeps the hardware busy; otherwise a per-token API is usually cheaper and simpler. Convert your expected GPU-hour cost to per-token cost and compare directly.

What makes B200 cloud pricing drop?

Longer commitments through reserved capacity and interruption-tolerant spot or preemptible instances both lower the effective hourly rate versus on-demand. Region and provider type also matter. Note: place the JSON-LD above inside a <script type="application/ld+json"> tag in the page head.

More from the blog

GPU Cloud Providers in Europe 2026: The Real Cost of Data Residency

LLM Economics

Jun 23, 20264 min read

GPU Cloud Providers in Europe 2026: The Real Cost of Data Residency

European GPU clouds offer B200 and H200 capacity with EU data residency and sovereignty, but residency usually carries a price premium that you should model as part of cost per token, not treat as a free checkbox.

Custom AI Chips vs NVIDIA in 2026: What It Means for Your Inference Cost

LLM Economics

Jun 23, 20263 min read

Custom AI Chips vs NVIDIA in 2026: What It Means for Your Inference Cost

Hyperscaler custom chips like Trainium, Google TPU, Maia, and Meta MTIA are built to cut the provider's cost of serving AI, but that only lowers your bill if it shows up as a cheaper per-token price or GPU-hour rate.

Oracle Cloud GPU Pricing in 2026: H100 vs H200 vs B200 Per-Hour Cost

LLM Economics

Jun 23, 20263 min read

Oracle Cloud GPU Pricing in 2026: H100 vs H200 vs B200 Per-Hour Cost

Oracle Cloud prices H100, H200, and B200 GPUs at different per-hour rates, but the cheapest choice depends on your model size and utilization, not on which chip is newest.

The Margin Memo

Pricing math, in your inbox.

One short note a week on AI pricing, token economics, and margin. No spam, unsubscribe anytime.