Home/
Part XI — Performance & Cost Optimization (Making It Fast and Affordable)/32. Token Economics for Builders/32.3 Choosing small vs large models strategically
32.3 Choosing small vs large models strategically
Overview and links for this section of the guide.
On this page
When to Use Each
| Task | Model | Reason |
|---|---|---|
| Classification | Flash | Simple task, speed matters |
| Extraction | Flash | Structured output, not creative |
| Summarization | Flash | Compression, not generation |
| Code generation | Pro | Quality matters more than speed |
| Complex reasoning | Pro | Chain of thought needed |
| Multi-step planning | Pro | Needs working memory |
Model Cascade
// Try cheap model first, escalate if needed
async function cascadeGenerate(prompt: string) {
// Step 1: Try Flash
const flashResponse = await flash.generate(prompt);
// Step 2: Check confidence
if (flashResponse.confidence > 0.9) {
return flashResponse; // Flash is good enough
}
// Step 3: Escalate to Pro
console.log('Escalating to Pro due to low confidence');
return pro.generate(prompt);
}
// Result: 80% of requests handled by Flash (cheap)
// Only 20% need Pro (expensive but necessary)
Smart Routing
// Route based on task complexity
function selectModel(task: Task): string {
const complexity = estimateComplexity(task);
if (complexity === 'simple') {
return 'gemini-1.5-flash'; // $0.075/1M
} else if (complexity === 'medium') {
return 'gemini-1.5-flash'; // Still use Flash
} else {
return 'gemini-1.5-pro'; // $1.25/1M
}
}
function estimateComplexity(task: Task): string {
if (task.type === 'classification') return 'simple';
if (task.type === 'extraction') return 'simple';
if (task.requiresReasoning) return 'complex';
if (task.outputLength > 1000) return 'complex';
return 'medium';
}