OpenAI's Internal Token Use Grew Up to 56x: What It Means for Your AI Budget

OpenAI's own usage data shows median internal output tokens rising as much as 56x since November 2025, a warning that per-seat AI costs can compound far faster than headline price cuts.

Jun 26, 2026 · 4 min read

OpenAI's Internal Token Use Grew Up to 56x: What It Means for Your AI Budget

Key takeaways

OpenAI reported median internal output-token growth of 56x in Research, 32x in Customer Support, 27x in Engineering, and 13x in Legal since November 2025.
The driver is behavioral, not price: agents now do more multi-step work per task, so token volume per seat climbs even when the price per token falls.
A flat per-seat price can quietly turn margin-positive seats into loss-makers as usage compounds.
Forecast token volume by department, not just by headcount, and re-check it every quarter.

What did OpenAI actually report?

According to data surfaced by the AINews roundup on Latent.Space, OpenAI's median internal Codex output tokens grew about 56x in Research, 32x in Customer Support, 27x in Engineering, and 13x in Legal since November 2025. These are growth multiples in tokens generated per user, not dollar figures, but tokens are the unit that maps directly to cost.

The headline is the spread. The same tool, rolled out across one company, produced wildly different token appetites depending on the job. Research and support workloads ran away; legal grew much more slowly. If a single company sees a 4x difference between its heaviest and lightest department, a SaaS product serving many companies will see an even wider range.

Why are tokens growing faster than prices are falling?

Model prices per million tokens have trended down for two years. It is tempting to assume your AI bill follows them down. It does not, because consumption is rising faster than unit prices are dropping.

The mechanism is agentic work. A year ago a user sent one prompt and got one answer. Now a single request can trigger a chain: read context, plan, call tools, generate, revise. Each step spends tokens. So output per active user compounds even as the price per token shrinks. A 30 percent price cut means nothing if volume per seat triples in the same period.

How does this break flat per-seat pricing?

Most AI features are still sold at a flat monthly price per seat. That works only while the average seat's token cost stays well below the price. The OpenAI data shows why that assumption is fragile.

Say, illustratively, you priced a seat assuming a steady monthly token cost and a comfortable margin. If the heaviest cohort of users grows its token use even 10x, that cohort can flip from profitable to underwater while your price stays fixed. You will not see it in aggregate revenue. You will see it in a slowly sinking gross margin that nobody can explain. The fix is not necessarily usage-based billing, but you do need to know your cost-to-serve per cohort before you decide.

How should founders forecast token cost now?

Start by forecasting tokens by use case, not by headcount. Group users by what they actually do with the product, estimate output tokens per task and tasks per month for each group, then multiply by current provider rates. Model a low, expected, and high case, because the OpenAI numbers show the high case can be an order of magnitude above the average.

Then pressure-test your price. Hold your price fixed and ask what happens to margin if your power users grow token use 10x to 50x over a year. If that scenario wipes out your margin, you have a packaging problem to solve now, while it is cheap. You can model these cohorts, provider rates, and margin scenarios in Calcaas before they show up in your P&L.

The takeaway: budget for AI by behavior and re-forecast quarterly, because token growth, not token price, is what will move your margin.

Frequently asked questions

Does a 56x token increase mean a 56x cost increase?

Not exactly, because per-token prices have fallen over the same period, which offsets part of the volume growth. But tokens are the cost driver, so a large volume jump still pushes the bill up sharply unless price cuts keep pace, which they rarely do.

Why do different departments use such different amounts of tokens?

Token use tracks how multi-step and open-ended the work is. Research and support involve long, iterative, tool-heavy tasks that generate many tokens, while more templated work like parts of legal review generates fewer per task.

Should I switch to usage-based pricing because of this?

Not automatically. The first step is knowing your cost-to-serve per cohort. If a few heavy users threaten margin, options include usage caps, tiered plans, or usage-based add-ons. The data tells you which lever you need before you pick one.

How often should I re-forecast AI token costs?

At least quarterly. Usage in this market is changing fast enough that an annual model will be stale within months, as the November-2025-to-now growth shows.

ShareX LinkedIn Facebook

More from the blog

1,000x Cheaper AI Inference: What It Would Actually Do to Your Margins

LLM Economics

Jun 25, 20264 min read

1,000x Cheaper AI Inference: What It Would Actually Do to Your Margins

Even a 1,000x cut in inference power costs would reshape AI unit economics, but only the share of your bill that is energy moves at that rate, not hardware, overhead, or provider markup.

OpenAI's Custom Chip and What It Actually Means for Your API Bill

LLM Economics

Jun 24, 20264 min read

OpenAI's Custom Chip and What It Actually Means for Your API Bill

A custom inference chip lowers what it costs OpenAI to serve a token, but your API price only drops if they pass the savings through, so model your own cost per token instead of betting on hardware headlines.

Gemini 3.5 Flash Gets Computer Use: What It Means for Agent Costs

LLM Economics

Jun 24, 20264 min read

Gemini 3.5 Flash Gets Computer Use: What It Means for Agent Costs

Putting agentic computer use in a budget-tier model can cut cost per step, but total agent cost depends on how many steps a task takes, so cheaper per token does not always mean cheaper per job.

The Margin Memo

Pricing math, in your inbox.

One short note a week on AI pricing, token economics, and margin. No spam, unsubscribe anytime.