21.2 Extracting structured data from images (carefully)
Overview and links for this section of the guide.
On this page
Goal: extraction you can trust
Extracting data from images is tempting because it feels “direct”: upload a screenshot of a receipt, get structured fields back.
The reality: image extraction is a lossy process. The only way to make it reliable is to treat it like engineering:
- define a schema,
- allow unknowns,
- capture confidence and evidence,
- verify with sampling and automated checks.
In extraction, guessing is corruption. Prefer null + a reason.
Why image extraction fails in predictable ways
Most failures fall into a few buckets:
- Legibility: low resolution, blur, compression artifacts, small fonts.
- Ambiguity: similar-looking characters (0/O, 1/l), unclear decimals, cut-off text.
- Layout confusion: multi-column documents, tables, dense receipts, wrapped lines.
- Missing context: currency/unit not visible, date format ambiguous, partial screenshot.
- Invented structure: the model outputs “the kind of fields this document usually has,” not what’s actually shown.
Your prompts should directly counter these failure modes.
Define an extraction contract (schema + rules)
An extraction contract has two parts:
- Schema: exact JSON fields, types, and allowed values.
- Rules: what to do when information is missing or unclear.
Good contract rules include:
- No guessing: use
nullwhen not visible. - Capture uncertainty: per-field confidence (e.g.,
high|medium|low). - Capture evidence: include the exact text snippet you read (or a short quote).
- Normalize carefully: preserve original formatting in
rawfields when in doubt. - Include an “unparsed” bucket: leftover text that didn’t fit the schema.
Pass 1: identify document type and what fields are present. Pass 2: extract with the exact schema for that document type.
Tables, charts, and dense layouts
For tables and dense receipts, you want to prevent “helpful reformatting”:
- Preserve ordering: represent rows as arrays; keep row order as seen.
- Separate raw vs parsed: store the exact string and a normalized numeric value (if possible).
- Validate arithmetic: totals should equal the sum of line items (when applicable).
- Explicitly define units: currency, time zone, measurement units.
For charts, be especially careful: pixel-based reading of axis labels is error-prone. If the task is important, prefer a tool-based approach (OCR + chart parsing) and use the model to validate or interpret, not to extract every tick value.
Verification loop and quality checks
To make extraction production-worthy, add checks:
- Sampling review: manually inspect a small set of extractions per batch.
- Schema validation: reject malformed JSON and retry with a stricter prompt.
- Consistency checks: totals, date formats, currency formats.
- Confidence gating: route low-confidence fields for human review.
- Golden set: maintain 25–100 labeled examples and track regression over time.
Copy-paste prompts
Prompt: strict JSON extraction with evidence
Extract data from the attached image. Output MUST be valid JSON.
Rules:
- Do not guess. If a value is not clearly visible, use null.
- For each extracted field, include a confidence: "high" | "medium" | "low".
- Include the exact raw text snippet you used as evidence.
Return JSON with this schema:
{
"document_type": string,
"fields": {
"date": { "value": string|null, "confidence": string, "evidence": string|null },
"total": { "value": number|null, "confidence": string, "evidence": string|null, "raw": string|null },
"currency": { "value": string|null, "confidence": string, "evidence": string|null }
},
"line_items": [{
"description": { "value": string|null, "confidence": string, "evidence": string|null },
"amount": { "value": number|null, "confidence": string, "evidence": string|null, "raw": string|null }
}],
"unparsed_text": string[]
}
Prompt: arithmetic validation
You extracted line items and a total from the image.
Task:
1) Check whether sum(line_items.amount) matches total (within 0.01).
2) If it doesn’t match, list the most likely causes (missing item, tax, unreadable amount).
3) Propose the smallest follow-up question or re-crop needed to resolve.
Return a short checklist.