Home/
Part XI — Performance & Cost Optimization (Making It Fast and Affordable)/32. Token Economics for Builders/32.2 Summarize and compress context safely
32.2 Summarize and compress context safely
Overview and links for this section of the guide.
The Context Problem
Large context = high cost. But naive truncation loses important information.
Compression Techniques
// 1. Summarize old conversations
async function compressHistory(messages: Message[]): Promise {
if (messages.length < 10) return formatMessages(messages);
// Keep recent messages verbatim
const recent = messages.slice(-5);
const old = messages.slice(0, -5);
// Summarize older messages
const summary = await model.generate(`
Summarize this conversation in 3 bullet points:
${formatMessages(old)}
`);
return `Previous context: ${summary}\n\n${formatMessages(recent)}`;
}
// 2. Extract relevant chunks only
async function selectiveContext(docs: Document[], query: string) {
// Don't send all 10 docs, send top 3
const ranked = await rerank(docs, query);
return ranked.slice(0, 3);
}
// 3. Hierarchical summarization
async function hierarchicalSummary(longDoc: string) {
// Split into chunks
const chunks = splitIntoChunks(longDoc, 2000);
// Summarize each chunk
const summaries = await Promise.all(
chunks.map(c => summarize(c))
);
// Summarize the summaries
return summarize(summaries.join('\n'));
}
Safe Compression
// Verify compression doesn't lose critical info
async function safeCompress(context: string, query: string) {
const compressed = await compress(context);
// Test: can we still answer the query?
const testAnswer = await model.generate(`
Context: ${compressed}
Query: ${query}
Can you answer this? Reply YES or NO with reason.
`);
if (testAnswer.includes('NO')) {
// Compression too aggressive, use original
return context;
}
return compressed;
}