32.4 Batch processing vs interactive mode

Overview and links for this section of the guide.

Comparison

Aspect Interactive Batch
Latency Low (real-time) High (acceptable)
Cost Higher Lower
Use case Chat, real-time Reports, bulk processing
Error handling Per-request Retry entire batch

Batching Strategy

// Instead of N separate requests, batch into 1
async function batchClassify(items: string[]) {
  // Bad: N API calls
  // for (const item of items) {
  //   await classify(item);  // N × latency, N × overhead
  // }
  
  // Good: 1 API call
  const prompt = `
Classify each item (output JSON array):
${items.map((item, i) => `${i+1}. ${item}`).join('\n')}
`;
  
  const response = await model.generate(prompt);
  return JSON.parse(response);
}

// Efficiency gain:
// 100 items × 10 tokens each = 1000 tokens
// vs. 100 requests × (overhead per request)

// Cost difference:
// Interactive: 100 requests × $0.002 = $0.20
// Batched: 1 request × $0.005 = $0.005 (40x cheaper)

When to Batch

  • Batch: Processing CSV uploads, generating reports, background jobs
  • Interactive: Chat, real-time suggestions, user-facing responses
  • Hybrid: Buffer requests for 100ms, batch what arrives
// Micro-batching for near-real-time
class RequestBatcher {
  private buffer: Request[] = [];
  private timeout: NodeJS.Timeout | null = null;
  
  add(request: Request) {
    this.buffer.push(request);
    
    if (!this.timeout) {
      this.timeout = setTimeout(() => this.flush(), 100);
    }
  }
  
  async flush() {
    const batch = this.buffer;
    this.buffer = [];
    this.timeout = null;
    
    const results = await batchProcess(batch);
    batch.forEach((req, i) => req.resolve(results[i]));
  }
}

Where to go next