7.3 The "write tests first" vibe pattern

Overview and links for this section of the guide.

The core idea

In vibe coding, the model can generate code faster than you can judge it. Tests flip that power balance: they give you a mechanical way to decide whether the output is correct.

The pattern is simple:

  1. Write tests first from acceptance criteria.
  2. Confirm tests match intent (this is a spec review).
  3. Implement the smallest change to make tests pass.
  4. Run tests after every diff and iterate.
Tests-first is not “academic”

It’s the fastest way to keep AI output honest. A failing test is better feedback than 20 paragraphs of explanation.

Why tests-first speeds up vibe coding

It speeds you up because it reduces two expensive activities:

  • Argument debugging: “it should work” back-and-forth without evidence.
  • Regression hunting: discovering later that you broke something earlier.

With tests, your loop becomes:

  • Change → run tests → see failure → fix → repeat.
  • Not: Change → hope → discover break later → panic.
The model is good at writing tests, too

Ask it to propose test cases from acceptance criteria. You still review them, but it can generate the scaffolding and edge-case set quickly.

When to use tests-first

Tests-first is a great default when:

  • you are fixing a bug (regression test first),
  • you are refactoring (lock behavior before moving code),
  • the behavior is tricky (parsing, validation, serialization),
  • the output must be stable (structured output / schemas),
  • you are worried about breaking existing behavior.

You can skip tests-first when the change is truly trivial (but even then, at least run existing tests).

A practical workflow (tests → implementation → verify)

Step 0: define “done”

Tests-first starts with acceptance criteria (Section 6.3). If “done” is fuzzy, tests will be fuzzy too.

Step 1: ask the model for tests only

Your prompt should explicitly forbid implementation. You are creating a spec artifact, not code.

Step 2: review the tests like a code review

Tests are part of your product contract. Review for:

  • coverage of the acceptance criteria,
  • edge cases and failure behavior,
  • determinism (no time/network flakiness),
  • minimal coupling to implementation details.

Step 3: make tests fail for the right reason

Run the tests before implementing. If tests already pass, they’re not testing the new behavior (or they’re too weak).

Step 4: implement the smallest change to pass

Ask for diff-only changes. One small diff per failing test cluster is usually ideal.

Step 5: iterate and lock in

Repeat until green, then commit. This becomes your stable base for the next feature.

Don’t “accept green” too easily

If the model makes tests pass by deleting the test, weakening assertions, or changing the acceptance criteria, that’s not success. That’s cheating. Keep the contract stable.

What good tests look like (for AI-generated code)

Good tests have three traits:

  • They test behavior, not structure: outputs, errors, exit codes, schemas.
  • They are easy to read: a test is documentation for “what we mean.”
  • They fail clearly: when broken, you can tell what changed.

Example: CLI behavior test

For CLI tools, prefer tests that call the entrypoint function and capture stdout/stderr, rather than shelling out to a subprocess (unless you need a true integration test).

Make your CLI testable

If your CLI reads from real stdin/out, refactor to allow injecting streams. This single design choice makes tests-first much easier.

Copy-paste prompt sequence

Prompt A: tests only

We are using the “write tests first” pattern.

Task:
[Describe the change/feature.]

Acceptance criteria:
- [...]

Constraints:
- Language/runtime: [...]
- Dependencies: [...]
- Do NOT implement the feature yet

Output:
- Diff-only changes that add/modify tests ONLY
- After the diff, explain how each test maps to an acceptance criterion

Prompt B: implement to pass tests

Now implement the feature to make the new tests pass.

Constraints:
- Keep the public API stable unless the tests require a change
- Do not weaken or delete the new tests
- Keep diffs small and focused

Output:
- Diff-only changes

Prompt C: fix one failing test minimally

Here is the failing test output:
(paste)

Fix the failure with the smallest change possible.

Constraints:
- Do not change tests unless the test is wrong (explain if so)
- Diff-only changes

Common pitfalls (and fixes)

Pitfall: tests are too tied to implementation details

Fix: rewrite tests to focus on public behavior (function outputs, schemas, CLI output), not internal helper functions.

Pitfall: tests don’t fail before implementation

Fix: add assertions or cases that clearly require the new behavior. A test that always passes is just noise.

Pitfall: tests are flaky

Fix: remove time/network randomness; use fixed inputs; avoid relying on ordering unless defined.

Pitfall: the model “fixes” by weakening the contract

Fix: restate acceptance criteria and instruct: “Do not change expected behavior. Only change implementation.”

Ship points for tests-first

  • SP1: tests added and reviewed; they fail for the right reason.
  • SP2: implementation passes tests with minimal diff.
  • SP3: refactor/cleanup (optional) with tests still green; commit.

Where to go next