RAG & Search

RAG / Search App Pricing Calculator

Model retrieval + generation costs for knowledge-base search and document Q&A products.

RAG products have two cost surfaces: indexing (one-time per document, embeddings + storage) and querying (every search hits embeddings + a generation model with retrieved chunks stuffed in). Calcaas lets you separate these so a heavy-indexer customer doesn't tank your margin.

Common pricing models

Per-document indexed

One-time fee scaled to embedding cost + storage.

Per-query subscription

Monthly tier with a query cap; overage billed per query.

Seat + usage hybrid

Flat seat fee covers small usage; heavy users pay per query.

Cost components to model

Embedding tokens (indexing)

Charged per million tokens; one-time per document.

Embedding tokens (queries)

Every search re-embeds the query — small but adds up.

Generation tokens

Retrieved chunks stuff the prompt; budget 4–8K input tokens per query.

Vector DB hosting

Treat as a fixed component or per-document storage cost.

Recommended models

Provider	Model	Why
OpenAI	text-embedding-3-small	Cheap, dense, default embedding pick.
OpenAI	gpt-4o-mini	Synthesizes retrieved chunks at low cost.
Anthropic	claude-sonnet-4-6	When answer quality matters more than cost.

Example scenario

Setup

$49/mo plan, 1,000 docs indexed (avg 5K tokens), 500 queries/mo with 6K-token context per query.

Watch out for

Onboarding-day indexing burst — bill it as a one-time setup fee or amortize it.

Run the numbers for your rag & search product

Free tier covers everything on this page. Pro unlocks 30+ currencies and live FX.