19.1 Turning logs into hypotheses
Overview and links for this section of the guide.
On this page
Goal: turn messy logs into a ranked hypothesis list
Logs are noisy and incidents are stressful. The model can help by doing one thing extremely well:
turning logs + context into a short ranked hypothesis list with confirming/denying tests.
The key is forcing the model to anchor every hypothesis in evidence.
In incident mode, you don’t want “try these random changes.” You want “here are the likely causes and the fastest checks to prove/disprove them.”
Quick triage checklist (before you ask the model)
Do a 2–5 minute triage pass so your prompt is grounded:
- Scope: which users/requests are affected?
- Timing: when did it start?
- Recent changes: what deployed/changed right before?
- Error categories: auth vs rate limit vs timeout vs invalid output?
- Rate/latency: is error rate increasing? are p95s spiking?
- Blast radius: one endpoint/service or many?
This context helps the model rank hypotheses correctly.
Build a “log pack” (high-signal context)
A log pack is the minimum set of evidence that makes diagnosis possible.
Include:
- the exact error messages / stack traces
- request ids or correlation ids for a few failing cases
- a few adjacent “successful” cases for contrast (if available)
- service/module names in the logs
- timestamps and environment (prod/staging)
- any rate limit/quota indicators
Redact secrets and PII.
Paste the relevant excerpt around the error and the correlation id trail. Huge logs reduce signal and increase confusion.
How to ask for hypotheses (without guessing)
Demand structure:
- 3–5 hypotheses maximum
- ranked by likelihood
- each hypothesis linked to specific evidence (“log line X suggests Y”)
- each hypothesis includes a confirming test and a denying test
This prevents the model from listing 20 vague possibilities.
Turn hypotheses into tests (fast disproof)
The fastest incident teams disprove hypotheses quickly:
- pick the cheapest test that distinguishes the top two hypotheses
- run it
- update the hypothesis list
- repeat
You want to narrow from “5 plausible causes” to “1 confirmed cause” as fast as possible.
“How can we prove this hypothesis wrong quickly?” often leads to better next steps than “what should we try?”
Copy-paste prompts
Prompt: hypotheses + tests only
We have an incident. Do NOT propose fixes yet.
Context:
- System: [brief description]
- Recent changes: [deploy/version/time]
- Environment: [prod/staging]
Evidence (log pack):
```text
...
```
Task:
1) Provide 3–5 ranked hypotheses for the root cause.
2) For each hypothesis:
- cite the log evidence supporting it
- give one confirming test and one denying test
3) Recommend the single cheapest next test to run.
Stop after that.
Prompt: request missing evidence
What additional logs/metrics would you need to confirm the top 2 hypotheses?
List them as a checklist (exact fields/queries if possible).
Stop after the checklist.
Anti-patterns (what not to do)
- Posting a screenshot instead of copy/paste logs
- Asking “why is this broken?” without repro + error output
- Accepting broad rewrites as “fixes” during incidents
- Retrying blindly and creating storms
- Logging secrets to debug faster