31.2 Output filtering and schema enforcement

Overview and links for this section of the guide.

Goal: treat model output as untrusted input

Model output is not “trusted application state.” It is untrusted data that must be validated before it can:

  • be rendered to users,
  • be written to a database,
  • be used to call tools,
  • be used to make product decisions.

Your job is to enforce a contract that makes unsafe outputs harmless.

Never execute raw model output

If the model output is used as code, SQL, or shell commands, you must add strong validation and sandboxing. Prefer “proposal-only” patterns where humans approve actions.

Schema enforcement: your strongest “no hallucination” tool

Schema enforcement doesn’t stop hallucination of content, but it stops hallucination of structure and makes the system testable.

Practical benefits:

  • your app can parse outputs reliably,
  • you can validate required fields,
  • you can reject unknown keys or dangerous fields,
  • you can enforce “not found,” “conflict,” and “refused” states explicitly.

Good schema design rules:

  • Make abstention explicit: include status and not_found fields.
  • Use enums: constrain modes and statuses.
  • Allow nulls for unknowns: don’t force guessing.
  • Keep it small: giant schemas are brittle.

Validation pipeline (parse → validate → sanitize)

A safe output pipeline usually looks like:

  1. Parse: strict JSON parse (or strict structured format).
  2. Validate schema: required fields, types, enums.
  3. Validate policy: no forbidden content; no disallowed tool requests.
  4. Sanitize for rendering: escape HTML; prevent script injection; truncate overly long fields.
  5. Apply business rules: reject unsafe scopes; require approvals; enforce permissions.

If any step fails: retry with stricter instructions, or fall back to a safe refusal/not_found state.

Validation belongs in code

Prompting helps, but validation is enforceable. Put critical constraints in code so they can’t be bypassed by injection.

Output filtering: what to block or redact

Filtering rules depend on your product, but common categories include:

  • Secrets and tokens: block or redact if patterns are detected.
  • PII: redact unless explicitly allowed and necessary.
  • Policy violations: disallowed content or instructions.
  • Unsafe instructions: attempts to call tools or request privileged operations.
  • Excessive output: truncate or reject outputs beyond size budgets.

Filtering is not perfect, but it is a strong “last line of defense” against accidental leaks and obvious injection outcomes.

Citation integrity checks (for RAG)

For grounded systems, citations are a contract. Validate them:

  • Chunk id validity: cited ids must be among retrieved sources.
  • Quote containment: quote text must appear in the chunk text.
  • Per-claim citations: every claim must have at least one citation.

This prevents “invented citations” and makes injection harder to hide.

Retries and fallbacks (fail closed)

When output validation fails, you should not “ship the best effort.”

Practical retry strategy:

  • Retry 1: remind “JSON only,” repeat schema, reduce verbosity.
  • Retry 2: reduce context, reduce number of tasks, switch to a more deterministic model.
  • Fallback: return status="not_found" or "refused" with next steps.

Hard cap retries. Otherwise validation failures become cost bombs.

Template: safe output contract

A minimal safe contract for many features:

{
  "status": "answered" | "not_found" | "needs_clarification" | "conflict" | "refused",
  "answer": { "summary": string, "bullets": string[] } | null,
  "missing_info": string[],
  "follow_up_question": string | null
}

For RAG, extend bullets into per-claim citations (see Part VIII citation schemas).

Where to go next