Home/
Part XIII — Expert Mode: Systems, Agents, and Automation/39. Prompt Engineering for Experts (The Real Stuff)/39.5 Prompt compression and distillation
39.5 Prompt compression and distillation
Overview and links for this section of the guide.
On this page
The 100k Token Problem
Context is expensive. RAG retrieves too much. Chat history grows forever.
Compression Techniques
- Auto-Summarization: Every 10 turns, ask a cheap model to summarize the history into 1 paragraph. Replace the history with that paragraph.
- Lingua Franca: Use specific, dense language. Instead of "Please write a function that takes a string...", use "def parse(s: str) -> dict:". Models speak code better than English.
- Filter irrelevant keys: If you are processing a JSON API response, delete all keys you don't use before putting it in the prompt.