Home/
Part VII — Multimodal & Long Context (Where AI Studio Gets Spicy)/22. Working With Documents and Large Text/22.3 Citation-like behavior (trace outputs to sources)
22.3 Citation-like behavior (trace outputs to sources)
Overview and links for this section of the guide.
On this page
Goal: answers you can verify
“Citation-like behavior” is how you keep long-context work from turning into confident fiction.
The idea is simple: every important claim should be traceable to a source. In practice, you need:
- stable chunk ids,
- quotes (short excerpts),
- clear mapping between claims and sources.
What “citation-like” means in practice
You don’t need academic citations. You need:
- Traceability: “This sentence came from chunk X.”
- Verifiability: enough quote/context to confirm it.
- Honesty about gaps: the model says “not found” when sources don’t support the claim.
Never allow invented citations
A made-up citation is worse than no citation: it gives false confidence. If the model can’t cite, it should say it can’t.
Citation formats that work
Pick one stable format and enforce it:
- Chunk id references:
[policy:3.2] - Inline claim mapping: each bullet ends with a citation list.
- Separate citations array: structured output that links claims to sources.
For reliability, prefer structured output: it is easier to validate and to render in an app.
A citation-first output schema
Here’s a practical schema for grounded answers:
{
"answer": {
"summary": string,
"bullets": [{
"claim": string,
"sources": [{ "chunk_id": string, "quote": string }]
}]
},
"not_found": {
"missing_claims": string[],
"searched_chunks": string[]
}
}
This forces the model to attach sources per claim.
Verification habits
To keep citations honest:
- Spot-check: verify at least one citation per answer while prototyping.
- Quote limits: keep quotes short (one or two sentences) to reduce copy errors.
- Require direct support: quotes should support the claim, not just mention related terms.
- Fail closed: if citations are missing, treat output as invalid and retry.
Copy-paste prompts
Prompt: claim-by-claim citations
You must answer using ONLY the provided chunks.
Rules:
- Every bullet must have at least 1 citation.
- A citation must include: chunk_id and a direct quote that supports the claim.
- If you cannot find support for a claim, do not include the claim. Instead list it under not_found.missing_claims.
Chunks:
[chunk_id: ...]
```text
...
```
Question: [question]
Return valid JSON with this schema:
{
"answer": {
"summary": string,
"bullets": [{ "claim": string, "sources": [{ "chunk_id": string, "quote": string }] }]
},
"not_found": { "missing_claims": string[], "searched_chunks": string[] }
}
Anti-patterns
- “Cite the document” without chunk ids (not verifiable).
- Long quotes that hide weak support.
- Sources that don’t match claims (keyword overlap ≠ support).
- Letting the model add “helpful context” that isn’t sourced.