RAG / Search App Pricing Calculator
Model retrieval + generation costs for knowledge-base search and document Q&A products.
RAG products have two cost surfaces: indexing (one-time per document, embeddings + storage) and querying (every search hits embeddings + a generation model with retrieved chunks stuffed in). Calcaas lets you separate these so a heavy-indexer customer doesn't tank your margin.
Common pricing models
Per-document indexed
One-time fee scaled to embedding cost + storage.
Per-query subscription
Monthly tier with a query cap; overage billed per query.
Seat + usage hybrid
Flat seat fee covers small usage; heavy users pay per query.
Cost components to model
Embedding tokens (indexing)
Charged per million tokens; one-time per document.
Embedding tokens (queries)
Every search re-embeds the query — small but adds up.
Generation tokens
Retrieved chunks stuff the prompt; budget 4–8K input tokens per query.
Vector DB hosting
Treat as a fixed component or per-document storage cost.
Recommended models
| Provider | Model | Why |
|---|---|---|
| OpenAI | text-embedding-3-small | Cheap, dense, default embedding pick. |
| OpenAI | gpt-4o-mini | Synthesizes retrieved chunks at low cost. |
| Anthropic | claude-sonnet-4-6 | When answer quality matters more than cost. |
Example scenario
Setup
$49/mo plan, 1,000 docs indexed (avg 5K tokens), 500 queries/mo with 6K-token context per query.
Watch out for
Onboarding-day indexing burst — bill it as a one-time setup fee or amortize it.
Run the numbers for your rag & search product
Free tier covers everything on this page. Pro unlocks 30+ currencies and live FX.