Home/
Part XI — Performance & Cost Optimization (Making It Fast and Affordable)/32. Token Economics for Builders/32.3 Choosing small vs large models strategically
32.3 Choosing small vs large models strategically
Overview and links for this section of the guide.
The Model Tier System
Think of models as employees with different hourly rates.
- The Intern (Gemini Flash / Small Models): Fast, cheap, eager. Great for summarizing, formatting, simple classification, and extracting data. Bad at complex reasoning or subtle nuance.
- The Senior Engineer (Gemini Pro / Large Models): Expensive, thoughtful, thorough. Necessary for architecture, debugging tricky errors, writing creative content, and handling complex instructions.
Model Routing Pattern
You don't have to pick one model for your whole app. You can use a router.
- User asks a question.
- A tiny, cheap classifier (or even a regex keyword match) decides the difficulty.
- Simple? Route to Flash.
- Hard? Route to Pro.
Example: A customer support bot. - "Reset my password" → handled by Flash (or a deterministic script). - "My data is corrupted and I'm angry" → handled by Pro (needs empathy and complex troubleshooting).
Hybrid Workflows
Use the big model to generate the plan, and the small model to execute it.
The "Architect-Builder" Pattern:
1. (Pro) Read the spec and generate a list of 5 files to create.
2. (Flash) Write file 1.
3. (Flash) Write file 2.
...
6. (Pro) Review all files and spot integration bugs.
This gives you high-quality direction with low-cost implementation.
Distillation
You can also use the Pro model to generate synthetic training data (examples) to fine-tune a Flash model. This lets the small model "punch above its weight" for specific tasks.