27. Testing AI Features Like a Real Engineer

On this page

What this section is for
Core principle: test contracts, evaluate quality
Testing layers for AI features
Section 27 map (27.1–27.5)
Where to start

What this section is for

Section 27 teaches you how to test AI features like an engineer, not like a magician.

That means:

separating what is deterministic from what is probabilistic,
testing the deterministic parts aggressively,
evaluating the probabilistic parts with curated datasets and rubrics,
building feedback loops that catch regressions before users do.

Tests are still useful with probabilistic systems

You can’t unit test “helpfulness” directly, but you can unit test schemas, refusal behavior, safety constraints, and invariants that must never break.

Core principle: test contracts, evaluate quality

Think of your AI feature as a pipeline:

Inputs: user prompt + context + retrieved sources + configuration
Model call: probabilistic output
Post-processing: parsing, validation, business rules, formatting
UX policy: confidence, “not found,” conflict detection, escalation

You can test contracts at multiple points:

“Output is valid JSON”
“Required fields exist”
“Citations reference provided chunk ids”
“Not-found triggers when evidence is missing”
“We never leak secrets in logs”

Then you evaluate quality using eval sets (Section 28).

Testing layers for AI features

A practical test stack:

Unit tests: deterministic invariants (schemas, validators, prompt builders).
Golden tests: “known good” input/output pairs for structured outputs.
Property-based tests: generate many inputs to ensure invariants always hold.
Fuzz tests: adversarial and malformed inputs to harden against injection and weird edge cases.
Snapshot tests: capture outputs with controlled update workflows to avoid accidental drift.

Most teams get high leverage by starting with: schema validation + golden tests + a small eval set.

27. Testing AI Features Like a Real Engineer

What this section is for

Core principle: test contracts, evaluate quality

Testing layers for AI features

Section 27 map (27.1–27.5)

Where to start

27. Testing AI Features Like a Real Engineer sub-sections