RAG Cost Calculator

RAG involves sending large chunks of documents to the AI. Estimate costs based on context window size and retrieval volume.

[ RAG COST PREJECTION ]

$4.20/mo

Pick your engine

GPT-4o

flagship

Best for most tasks

Input / 1M$2.50

Output / 1M$10.00

GPT-4o mini

fast

Fastest & cheapest OpenAI option

Input / 1M$0.15

Output / 1M$0.60

o1

flagship

Advanced reasoning

Input / 1M$15.00

Output / 1M$60.00

o1-mini

standard

Fast reasoning model

Input / 1M$3.00

Output / 1M$12.00

GPT-3.5 Turbo

fast

Legacy fast model

Input / 1M$0.50

Output / 1M$1.50

Scale & Volume

Monthly Active Users1,000

Requests per user / month20

Avg. Prompt Size (tokens)

Includes retrieved context + instructions

Avg. Response Size (tokens)

Length of the AI's response

Review Projections

Per Request$0.000210

Per Day$0.1400

Per Month$4.20

Cost as you grow

Users	Requests / Mo	Monthly	Yearly
100	2.0M	$420.00	$5,040.00
1.0K	20.0M	$4,200.00	$50,400.00
10.0K	200.0M	$42,000.00	$504.0K
100.0K	2000.0M	$420.0K	$5.0M

Save your results

Get your cost estimate sent to your inbox. We'll also send tips on how to reduce your AI spending.

No spam. Unsubscribe any time.

RAG vs Long Context

RAG (Retrieval) is usually cheaper than feeding a massive document into a long-context model (like Gemini 1.5 Pro) for every request. However, as retrieval accuracy becomes more important, your context sizes will grow. Watch your input token costs closely.