29.2 Circuit breakers and fallback modes

On this page

Goal: prevent cascading failures when the model is unhealthy

When the model or provider is unhealthy, naive retry behavior can turn a partial outage into a full outage:

Circuit breakers and fallback modes are how you keep the rest of your system healthy during model trouble.

A circuit breaker is a policy:

The goal is to stop making failing calls long enough for recovery—and to protect your app from retry storms.

Circuit breakers are a cost control tool

Even if you can “afford” failures, you often can’t afford unlimited retries during outages. Breakers put a hard cap on burn.

Trip breakers based on objective signals, not feelings:

Error rate: % of 5xx or timeouts over a rolling window.
Rate limits: sustained 429 responses.
Latency: p95/p99 above threshold for a window.
Validation failure rate: sudden increase in invalid JSON/schema failures (can indicate model/prompt mismatch).
Cost spikes: token usage per request exceeding budgets.

Different breakers can exist for different features or routes (e.g., “summarizer” vs “RAG Q&A”).

Fallback modes should be designed upfront. Common options:

Fallbacks must be safe

In degraded mode, do not relax guardrails. Under stress is when systems leak data and invent facts. Prefer abstention and safe alternatives.

Degraded-mode UX should be:

transparent: clearly indicate reduced capability.
actionable: suggest next step (try again, search docs, contact support).
non-alarming: present it as “temporarily unavailable” rather than “system broken.”
consistent: same failure state maps to same user experience.

For RAG, a good fallback is often: show top relevant source chunks and let the user click through, instead of generating an answer.

Fallbacks must be tested like any other product behavior: