13.4 Logging and error handling patterns for LLM calls
Overview and links for this section of the guide.
On this page
- Goal: make LLM calls diagnosable and resilient
- Error taxonomy (what can go wrong)
- Timeouts (must-have)
- Retries with backoff (must be disciplined)
- Invalid output handling (schemas, parsing failures)
- Safety blocks and refusals (normal outcome)
- Logging fields that actually help
- User-facing error behavior (don’t leak details)
- Minimal metrics (the few that matter)
- Copy-paste templates
- Where to go next
Goal: make LLM calls diagnosable and resilient
LLM calls will fail sometimes. Your job is to make failure:
- visible: you can tell what happened and why,
- bounded: failures don’t cascade and take down the app,
- recoverable: retries and fallbacks exist,
- safe: logs don’t leak sensitive data.
This page gives you concrete patterns you can implement in your wrapper layer (13.3).
If you don’t categorize errors, set timeouts, and cap retries, you’ll experience LLM behavior as random. With the right plumbing, it becomes predictable.
Error taxonomy (what can go wrong)
Start with a simple taxonomy. A practical set:
- auth_error: bad credentials, wrong project, missing permissions.
- rate_limit: 429s, tokens-per-minute, concurrency throttles.
- timeout: request took too long; network hung.
- network_error: transient connectivity issues.
- blocked: safety refusal / filtered content.
- invalid_request: your request is malformed (bad params, too long).
- invalid_output: output can’t be parsed/validated (bad JSON, schema mismatch).
- unknown: catch-all with request id for investigation.
The important part is that your code returns categories, not just “error.” Categories drive correct retries and correct UX.
Your wrapper should return an explicit status or error_category so callers can handle it. Don’t bury it in log text.
Timeouts (must-have)
Without timeouts:
- requests hang,
- concurrency grows,
- users spam retries,
- your app becomes unstable.
Practical guidance:
- set a default timeout for all model calls,
- expose timeout as a config value (env var),
- log timeouts as their own category,
- prefer a fast failure + retry message over infinite spinners.
Retries with backoff (must be disciplined)
Retries should be:
- selective: only retry retryable errors (rate limit, transient network, some timeouts),
- backed off: exponential backoff with jitter,
- capped: maximum attempts to prevent storms,
- observable: log attempt count and category.
When not to retry automatically
- auth_error: won’t fix itself.
- invalid_request: your code/spec is wrong.
- blocked: repeating the same unsafe request won’t help.
- invalid_output: sometimes retrying helps (if model flaked), but cap aggressively and consider switching to a stricter schema or lower temperature.
Blind retries amplify load, increase cost, and often make rate limiting worse. Always use backoff + caps.
Invalid output handling (schemas, parsing failures)
LLM outputs are not guaranteed to match your expectations, even if the model is “good.” Handle invalid outputs as a normal error category.
A robust invalid-output flow
- Attempt parse: JSON parse or structured parse.
- Validate schema: required fields, enums, types.
- If invalid: return
invalid_outputwith safe details (e.g., “missing field X”). - Optionally retry once: with a stricter “repair” prompt or lower temperature.
- Fallback: return a user-friendly error or a safe partial output.
Most invalid outputs come from ambiguous prompts or weak schemas. Tighten the contract before blaming the model.
Safety blocks and refusals (normal outcome)
Safety behavior should be handled like any other outcome type:
- return
blockedstatus, - show a refusal-aware UX state,
- offer safe alternatives or clarifying questions,
- log category codes/metadata (not raw content).
Do not treat safety blocks as “mysterious errors.” They’re part of normal product behavior.
Logging fields that actually help
Early logs should answer: “what happened, which prompt/model, how long did it take, what category did it fail with?”
Minimal log fields (good default)
- request_id: correlation id
- timestamp
- app_env: dev/staging/prod
- model: name/version
- prompt_version: id/version string
- latency_ms
- attempt (retry attempt number)
- outcome_category: ok / rate_limit / timeout / blocked / invalid_output / ...
- token_estimates: input/output sizes (approx is fine early)
What to avoid logging by default
- raw prompts (unless you have strict controls),
- raw user inputs (often sensitive),
- raw model outputs (may contain user data),
- headers and credentials.
Log prompt_version and schema_version so you can reproduce behavior without storing the full text everywhere.
User-facing error behavior (don’t leak details)
Users need clarity, not stack traces. Good UX rules:
- Be specific at a high level: “rate limited” vs “something went wrong.”
- Give next steps: “try again in a moment” or “reduce input size.”
- Don’t leak internals: no raw provider errors or request payloads in UI.
- Make retry explicit: a retry button that respects backoff is better than encouraging spam clicks.
Minimal metrics (the few that matter)
If you track only a few things, track:
- success rate: % ok
- p50/p95 latency: how long calls take
- error rate by category: rate_limit vs timeout vs invalid_output
- calls per success: retries inflate this
- tokens per success: cost proxy
These metrics will tell you where to invest: prompt size, caching, retries, model selection, or schema tightening.
Copy-paste templates
Template: retry policy (drop-in text)
Retry policy:
- Retryable: rate_limit, transient network errors, some timeouts
- Max attempts: 3
- Backoff: exponential + jitter
- Non-retryable: auth_error, invalid_request, blocked
- Invalid output: at most 1 retry with stricter schema/repair prompt
- Log: request_id, attempt, outcome_category, latency_ms
Template: outcome categories
Outcome categories:
- ok
- blocked
- rate_limit
- timeout
- network_error
- invalid_request
- invalid_output
- auth_error
- unknown
Template: user-facing copy
We couldn’t complete this request (timeout / rate limit).
Please wait a moment and try again.
If this keeps happening, reduce input size or try later.