32.1 Why tokens cost more than you think

Overview and links for this section of the guide.

The Basic Math

Token pricing is usually quoted per 1 million tokens. For example, input might be \$0.35/1M and output might be \$1.05/1M. This sounds incredibly cheap—until you do the math on a real conversation.

A "simple" chat app sends the entire conversation history with every new message. If a user sends 10 messages:

  1. Msg 1: 100 tokens input
  2. Msg 2: 100 (original) + 100 (reply) + 100 (new) = 300 tokens input
  3. Msg 10: ~2000 tokens input

You aren't paying for 10 messages. You are paying for the cumulative sum of the entire session history, re-read 10 times.

The Hidden Multipliers

Several factors multiply your costs unexpectedly:

  • System Instructions: A 500-token "persona" or "ruleset" is sent with every single turn. In a 20-turn conversation, that's 10,000 tokens just for the rules.
  • Tool Definitions: If you give the model 20 tools to choose from, those definitions (schemas, descriptions) count as input tokens. They are resent every time.
  • Chain of Thought: If you ask the model to "think step-by-step," it generates more output tokens. Since output is more expensive than input, verbose reasoning is the costliest operation.
The "Verbose API" Trap

If you paste a 5,000-line JSON file into context just to ask "what is the value of key X?", you paid for 5,000 tokens to get a 1-token answer. Always filter data before sending it to the model.

The RAG Tax

Retrieval Augmented Generation (RAG) is efficient for knowledge, but expensive for tokens. If you retrieve 5 chunks of 1,000 tokens each for every query, you are adding 5,000 tokens of input overhead to every single question.

If the user asks "Hi", and your system blindly retrieves 5 documents about "Hi" from your vector DB, you just wasted money. Always check if retrieval is necessary before adding context.

Where to go next