22.3 Citation-like behavior (trace outputs to sources)

On this page

Goal: answers you can verify
What “citation-like” means in practice
Citation formats that work
A citation-first output schema
Verification habits
Copy-paste prompts
Anti-patterns
Where to go next

Goal: answers you can verify

“Citation-like behavior” is how you keep long-context work from turning into confident fiction.

The idea is simple: every important claim should be traceable to a source. In practice, you need:

stable chunk ids,
quotes (short excerpts),
clear mapping between claims and sources.

What “citation-like” means in practice

You don’t need academic citations. You need:

Traceability: “This sentence came from chunk X.”
Verifiability: enough quote/context to confirm it.
Honesty about gaps: the model says “not found” when sources don’t support the claim.

Never allow invented citations

A made-up citation is worse than no citation: it gives false confidence. If the model can’t cite, it should say it can’t.

Citation formats that work

Pick one stable format and enforce it:

Chunk id references: [policy:3.2]
Inline claim mapping: each bullet ends with a citation list.
Separate citations array: structured output that links claims to sources.

For reliability, prefer structured output: it is easier to validate and to render in an app.

A citation-first output schema

Here’s a practical schema for grounded answers:

{
  "answer": {
    "summary": string,
    "bullets": [{
      "claim": string,
      "sources": [{ "chunk_id": string, "quote": string }]
    }]
  },
  "not_found": {
    "missing_claims": string[],
    "searched_chunks": string[]
  }
}

This forces the model to attach sources per claim.

Verification habits

To keep citations honest:

Spot-check: verify at least one citation per answer while prototyping.
Quote limits: keep quotes short (one or two sentences) to reduce copy errors.
Require direct support: quotes should support the claim, not just mention related terms.
Fail closed: if citations are missing, treat output as invalid and retry.

Copy-paste prompts

Prompt: claim-by-claim citations

You must answer using ONLY the provided chunks.

Rules:
- Every bullet must have at least 1 citation.
- A citation must include: chunk_id and a direct quote that supports the claim.
- If you cannot find support for a claim, do not include the claim. Instead list it under not_found.missing_claims.

Chunks:
[chunk_id: ...]
```text
...
```

Question: [question]

Return valid JSON with this schema:
{
  "answer": {
    "summary": string,
    "bullets": [{ "claim": string, "sources": [{ "chunk_id": string, "quote": string }] }]
  },
  "not_found": { "missing_claims": string[], "searched_chunks": string[] }
}

Anti-patterns

“Cite the document” without chunk ids (not verifiable).
Long quotes that hide weak support.
Sources that don’t match claims (keyword overlap ≠ support).
Letting the model add “helpful context” that isn’t sourced.

22.3 Citation-like behavior (trace outputs to sources)

Goal: answers you can verify

What “citation-like” means in practice

Citation formats that work

A citation-first output schema

Verification habits

Copy-paste prompts

Prompt: claim-by-claim citations

Anti-patterns

Where to go next