32.3 Choosing small vs large models strategically

Overview and links for this section of the guide.

When to Use Each

Task Model Reason
Classification Flash Simple task, speed matters
Extraction Flash Structured output, not creative
Summarization Flash Compression, not generation
Code generation Pro Quality matters more than speed
Complex reasoning Pro Chain of thought needed
Multi-step planning Pro Needs working memory

Model Cascade

// Try cheap model first, escalate if needed
async function cascadeGenerate(prompt: string) {
  // Step 1: Try Flash
  const flashResponse = await flash.generate(prompt);
  
  // Step 2: Check confidence
  if (flashResponse.confidence > 0.9) {
    return flashResponse;  // Flash is good enough
  }
  
  // Step 3: Escalate to Pro
  console.log('Escalating to Pro due to low confidence');
  return pro.generate(prompt);
}

// Result: 80% of requests handled by Flash (cheap)
// Only 20% need Pro (expensive but necessary)

Smart Routing

// Route based on task complexity
function selectModel(task: Task): string {
  const complexity = estimateComplexity(task);
  
  if (complexity === 'simple') {
    return 'gemini-1.5-flash';  // $0.075/1M
  } else if (complexity === 'medium') {
    return 'gemini-1.5-flash';  // Still use Flash
  } else {
    return 'gemini-1.5-pro';    // $1.25/1M
  }
}

function estimateComplexity(task: Task): string {
  if (task.type === 'classification') return 'simple';
  if (task.type === 'extraction') return 'simple';
  if (task.requiresReasoning) return 'complex';
  if (task.outputLength > 1000) return 'complex';
  return 'medium';
}

Where to go next