Home/
Part XI — Performance & Cost Optimization (Making It Fast and Affordable)/33. Latency Optimization/33.5 Warm starts and connection reuse
33.5 Warm starts and connection reuse
Overview and links for this section of the guide.
Cold Start Problem
First requests are slow due to initialization overhead:
- SDK initialization
- Connection establishment
- TLS handshake
- Authentication
First request: 500ms overhead + 1000ms generation = 1500ms
Subsequent: 50ms overhead + 1000ms generation = 1050ms
Warm-up Strategies
// Initialize SDK at app start, not first request
import { GoogleGenerativeAI } from '@google/generative-ai';
// Create once, reuse everywhere
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY);
const model = genAI.getGenerativeModel({ model: 'gemini-1.5-flash' });
// Warm-up on app start
async function warmUp() {
try {
await model.generateContent('Hello'); // Tiny request
console.log('Model warmed up');
} catch (e) {
console.error('Warm-up failed:', e);
}
}
// Call during app initialization
await warmUp();
// For serverless: keep container warm
// Use a scheduled ping every few minutes
Connection Management
// Use HTTP keep-alive
import { Agent } from 'https';
const keepAliveAgent = new Agent({
keepAlive: true,
keepAliveMsecs: 30000,
maxSockets: 10
});
// Pass to API client if supported
const client = new APIClient({ agent: keepAliveAgent });
// Connection pooling for databases
import { Pool } from 'pg';
const pool = new Pool({
max: 20, // Max connections
idleTimeoutMillis: 30000, // Close idle after 30s
connectionTimeoutMillis: 2000
});
// Reuse connections, don't create new each request
async function query(sql: string) {
const client = await pool.connect();
try {
return await client.query(sql);
} finally {
client.release(); // Return to pool
}
}