RAG Cost Calculator
RAG involves sending large chunks of documents to the AI. Estimate costs based on context window size and retrieval volume.
[ RAG COST PREJECTION ]
$4.20/mo
1
Pick your engine
GPT-4o
flagship
Best for most tasks
Input / 1M$2.50
Output / 1M$10.00
GPT-4o mini
fast
Fastest & cheapest OpenAI option
Input / 1M$0.15
Output / 1M$0.60
o1
flagship
Advanced reasoning
Input / 1M$15.00
Output / 1M$60.00
o1-mini
standard
Fast reasoning model
Input / 1M$3.00
Output / 1M$12.00
GPT-3.5 Turbo
fast
Legacy fast model
Input / 1M$0.50
Output / 1M$1.50
2
Scale & Volume
1,000
20
Includes retrieved context + instructions
Length of the AI's response
3
Review Projections
Per Request$0.000210
Per Day$0.1400
Per Month$4.20
Cost as you grow
| Users | Requests / Mo | Monthly | Yearly |
|---|---|---|---|
| 100 | 2.0M | $420.00 | $5,040.00 |
| 1.0K | 20.0M | $4,200.00 | $50,400.00 |
| 10.0K | 200.0M | $42,000.00 | $504.0K |
| 100.0K | 2000.0M | $420.0K | $5.0M |
Save your results
Get your cost estimate sent to your inbox. We'll also send tips on how to reduce your AI spending.
No spam. Unsubscribe any time.
RAG vs Long Context
RAG (Retrieval) is usually cheaper than feeding a massive document into a long-context model (like Gemini 1.5 Pro) for every request. However, as retrieval accuracy becomes more important, your context sizes will grow. Watch your input token costs closely.

