33.5 Warm starts and connection reuse

Overview and links for this section of the guide.

Cold Start Problem

First requests are slow due to initialization overhead:

  • SDK initialization
  • Connection establishment
  • TLS handshake
  • Authentication
First request:  500ms overhead + 1000ms generation = 1500ms
Subsequent:     50ms overhead + 1000ms generation = 1050ms

Warm-up Strategies

// Initialize SDK at app start, not first request
import { GoogleGenerativeAI } from '@google/generative-ai';

// Create once, reuse everywhere
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY);
const model = genAI.getGenerativeModel({ model: 'gemini-1.5-flash' });

// Warm-up on app start
async function warmUp() {
  try {
    await model.generateContent('Hello');  // Tiny request
    console.log('Model warmed up');
  } catch (e) {
    console.error('Warm-up failed:', e);
  }
}

// Call during app initialization
await warmUp();

// For serverless: keep container warm
// Use a scheduled ping every few minutes

Connection Management

// Use HTTP keep-alive
import { Agent } from 'https';

const keepAliveAgent = new Agent({
  keepAlive: true,
  keepAliveMsecs: 30000,
  maxSockets: 10
});

// Pass to API client if supported
const client = new APIClient({ agent: keepAliveAgent });

// Connection pooling for databases
import { Pool } from 'pg';

const pool = new Pool({
  max: 20,                    // Max connections
  idleTimeoutMillis: 30000,   // Close idle after 30s
  connectionTimeoutMillis: 2000
});

// Reuse connections, don't create new each request
async function query(sql: string) {
  const client = await pool.connect();
  try {
    return await client.query(sql);
  } finally {
    client.release();  // Return to pool
  }
}

Where to go next