32.4 Batch processing vs interactive mode

On this page

Comparison
Batching Strategy
When to Batch
Where to go next

Comparison

Aspect	Interactive	Batch
Latency	Low (real-time)	High (acceptable)
Cost	Higher	Lower
Use case	Chat, real-time	Reports, bulk processing
Error handling	Per-request	Retry entire batch

Batching Strategy

// Instead of N separate requests, batch into 1
async function batchClassify(items: string[]) {
  // Bad: N API calls
  // for (const item of items) {
  //   await classify(item);  // N × latency, N × overhead
  // }
  
  // Good: 1 API call
  const prompt = `
Classify each item (output JSON array):
${items.map((item, i) => `${i+1}. ${item}`).join('\n')}
`;
  
  const response = await model.generate(prompt);
  return JSON.parse(response);
}

// Efficiency gain:
// 100 items × 10 tokens each = 1000 tokens
// vs. 100 requests × (overhead per request)

// Cost difference:
// Interactive: 100 requests × $0.002 = $0.20
// Batched: 1 request × $0.005 = $0.005 (40x cheaper)

When to Batch

Batch: Processing CSV uploads, generating reports, background jobs
Interactive: Chat, real-time suggestions, user-facing responses
Hybrid: Buffer requests for 100ms, batch what arrives

// Micro-batching for near-real-time
class RequestBatcher {
  private buffer: Request[] = [];
  private timeout: NodeJS.Timeout | null = null;
  
  add(request: Request) {
    this.buffer.push(request);
    
    if (!this.timeout) {
      this.timeout = setTimeout(() => this.flush(), 100);
    }
  }
  
  async flush() {
    const batch = this.buffer;
    this.buffer = [];
    this.timeout = null;
    
    const results = await batchProcess(batch);
    batch.forEach((req, i) => req.resolve(results[i]));
  }
}

Where to go next

32.5 Measuring cost per successful task