32.1 Why tokens cost more than you think
Overview and links for this section of the guide.
On this page
The Basic Math
Token pricing is usually quoted per 1 million tokens. For example, input might be \$0.35/1M and output might be \$1.05/1M. This sounds incredibly cheap—until you do the math on a real conversation.
A "simple" chat app sends the entire conversation history with every new message. If a user sends 10 messages:
- Msg 1: 100 tokens input
- Msg 2: 100 (original) + 100 (reply) + 100 (new) = 300 tokens input
- Msg 10: ~2000 tokens input
You aren't paying for 10 messages. You are paying for the cumulative sum of the entire session history, re-read 10 times.
The Hidden Multipliers
Several factors multiply your costs unexpectedly:
- System Instructions: A 500-token "persona" or "ruleset" is sent with every single turn. In a 20-turn conversation, that's 10,000 tokens just for the rules.
- Tool Definitions: If you give the model 20 tools to choose from, those definitions (schemas, descriptions) count as input tokens. They are resent every time.
- Chain of Thought: If you ask the model to "think step-by-step," it generates more output tokens. Since output is more expensive than input, verbose reasoning is the costliest operation.
If you paste a 5,000-line JSON file into context just to ask "what is the value of key X?", you paid for 5,000 tokens to get a 1-token answer. Always filter data before sending it to the model.
The RAG Tax
Retrieval Augmented Generation (RAG) is efficient for knowledge, but expensive for tokens. If you retrieve 5 chunks of 1,000 tokens each for every query, you are adding 5,000 tokens of input overhead to every single question.
If the user asks "Hi", and your system blindly retrieves 5 documents about "Hi" from your vector DB, you just wasted money. Always check if retrieval is necessary before adding context.