6.4 Use examples as "mini tests"

Overview and links for this section of the guide.

Why examples are so effective

Examples are the simplest way to make behavior concrete. They work because they:

  • reduce ambiguity (“this is what I mean”),
  • anchor edge cases without long explanations,
  • act like lightweight tests the model can reason about,
  • give you immediate verification targets after generation.
Examples are acceptance criteria with teeth

When you include input/output pairs, you’re not asking the model to “be correct.” You’re defining correctness.

How to choose high-value examples

Good examples are small and diagnostic. Aim for 5–10 total examples per feature.

1) Happy path examples

Include 2–4 simple examples that prove the main use case works.

2) Edge case examples

Include 2–4 that stress boundaries:

  • empty/whitespace input,
  • invalid format,
  • unexpected characters,
  • large inputs,
  • ambiguous cases that usually break parsers.

3) Error behavior examples

Include 1–2 examples that specify error outputs and codes. This prevents stack traces and inconsistent failure behavior.

If you only add one example, add a failing one

Models often get the happy path right. It’s the failure modes that make apps brittle.

How to format examples in prompts

Pick a format that is unambiguous and easy to copy/paste into tests. Common formats:

Table format

Examples:
1) Input: "2+2"      Output: "4"
2) Input: "1/2"      Output: "0.5"
3) Input: "2+*3"     Error: exit 2, stderr contains "invalid syntax"

JSON-ish format

examples = [
  {"input": "2+2", "output": "4"},
  {"input": "2*(3+4)", "output": "14"},
  {"input": "2+*3", "error": {"exit_code": 2}},
]

Choose one format and keep it consistent across the guide and your projects.

Examples that catch edge cases

Here are the kinds of examples that catch real bugs:

  • Whitespace: " 2 + 2 "
  • Unary operators: "-3 + 5"
  • Parentheses: "2*(3+4)"
  • Invalid syntax: "2+*3"
  • Division by zero: define expected behavior explicitly
  • Empty input: define expected behavior explicitly
Define behavior, don’t outsource it

If you don’t define what happens on “division by zero” or “empty input,” the model will choose something. That choice will become your behavior unless you correct it later.

Building a tiny “example suite”

Over time, keep a small example suite for your project:

  • It’s a test set you can rerun mentally.
  • It’s a regression set you can paste into prompts.
  • It reveals when a prompt change breaks prior behavior.

This is the seed of evaluation harnesses later in the guide—just scaled down to something you can do today.

A prompt template using mini tests

Task:
[Describe the change.]

Mini tests (examples):
- Input: [...]
  Expected: [...]
- Input: [...]
  Expected: [...]
- Input: [...]
  Expected: [error behavior]

Constraints:
- Language/runtime: [...]
- Dependencies: [...]

Output:
- Diff-only changes
- After the diff, restate how each mini test is satisfied

Where to go next