∑

LLM Economics

Token math, model selection, and the true unit cost of every generation.

Page 2 of 2

Jun 22, 20265 min read

AI Cost Optimization in 2026: A Practical Guide for Founders

Cut your AI bill in 2026 by working five levers in order, model routing, prompt size, caching, output limits, and inference efficiency, then re-check that your pricing still covers the new cost basis.

Self-Hosting vs API: The Real Cost Math Behind '1/6 the Price'

LLM Economics

Jun 21, 20263 min read

Self-Hosting vs API: The Real Cost Math Behind '1/6 the Price'

Self-hosting an open LLM can cost a fraction of a frontier API, but only when your GPUs stay busy. The honest comparison is GPU dollars per hour divided by your actual throughput, versus the API price per token.

Inference Economics: Why a $13B Valuation Is a Bet on the Token Spread

LLM Economics

Jun 21, 20264 min read

Inference Economics: Why a $13B Valuation Is a Bet on the Token Spread

When an inference provider raises at a $13B valuation, investors are buying the spread between what a token costs to serve and what you are charged. That spread is why your API price is not a cost floor.

The Margin Memo

Pricing math, in your inbox.

One short note a week on AI pricing, token economics, and margin. No spam, unsubscribe anytime.