33.5 Warm starts and connection reuse

Overview and links for this section of the guide.

The TLS/TCP Handshake

Setting up a secure connection to `generativelanguage.googleapis.com` takes time (ping time × 3 or 4 round trips). If you create a new client for every request, you pay this penalty every time.

Fix: Reuse your HTTP client / API client instance. In Node.js or Python, keep the client global or singleton. Do not instantiate `new VertexAI()` inside your request handler if you can avoid it.

The Serverless Cold Start

If you host your vibe coding app on AWS Lambda or Google Cloud Functions:

  1. Request comes in.
  2. Cloud provider boots a container (1-2s).
  3. Your code imports libraries (0.5s).
  4. You call LLM (5s).

That 2.5s overhead is perceivable.

Fixes: - Use provisioned concurrency (keep one instance warm). - Use lighter libraries (lazy load huge dependencies). - Or, for internal tools, just run a persistent server (Docker container) instead of serverless functions.

Where to go next