Voice & Transcription SaaS Pricing Calculator
Price audio products billed per minute — transcription, voice cloning, and TTS.
Voice products bill in minutes, not tokens — but the underlying cost varies 10× across providers and quality tiers. Calcaas separates audio-in (transcription) from audio-out (TTS) so you can model both sides of a voice-agent product correctly.
Common pricing models
Per-minute passthrough
Direct minute-based billing with a margin multiplier; classic transcription model.
Subscription + minute cap
Monthly tier with included minutes; overage billed per minute.
Per-call (voice agent)
Agent products price per completed call — bundle ASR + LLM + TTS into one number.
Cost components to model
Audio-to-text minutes
Transcription cost per minute of input audio.
Text-to-speech characters
TTS billed per character — multiply by avg utterance length.
LLM turn cost
For voice agents, the LLM in the middle is often the largest component.
Recommended models
| Provider | Model | Why |
|---|---|---|
| OpenAI | whisper-1 | Strong default for transcription at $0.006/min. |
| Deepgram | nova-3 | Faster and cheaper at scale for streaming use cases. |
| ElevenLabs | eleven-v3 | Premium TTS — price your top tier accordingly. |
Example scenario
Setup
$25/mo for 500 minutes of transcription on Whisper + 50K TTS characters on ElevenLabs.
Watch out for
Voice-agent products — the LLM cost per turn often dwarfs ASR/TTS combined.
Run the numbers for your voice & transcription product
Free tier covers everything on this page. Pro unlocks 30+ currencies and live FX.