13.5 Versioning prompts (treat prompts like code)
Overview and links for this section of the guide.
On this page
- Why prompts must be versioned
- What to version (prompts, schemas, settings)
- A practical repo structure for prompts
- Prompt IDs and versions (how to name things)
- Changing prompts safely (review + verification)
- Logging prompt versions in your app
- Migration and backwards compatibility
- Copy-paste templates
- Where to go next
Why prompts must be versioned
In AI apps, prompt changes are behavior changes. If you don’t version prompts, you will experience:
- silent regressions (“it used to work”),
- inconsistent outputs across environments,
- inability to reproduce good runs,
- debugging that turns into guessing.
Versioning prompts makes AI behavior manageable the same way versioning code makes software manageable.
Even if your “code” is unchanged, a prompt update can change results. Treat prompts as first-class artifacts.
What to version (prompts, schemas, settings)
Versioning only the prompt text is not enough. You want a complete “behavior bundle”:
- Prompt templates: system/house rules and task prompts.
- Schemas: JSON schemas or structured-output contracts.
- Model settings: key parameters that affect output (temperature, max output).
- Model choice: model name/version used for a given prompt version.
At minimum, prompts + schemas should be versioned in git.
A practical repo structure for prompts
A simple structure that scales:
src/
llm/
prompts/
system.md
tasks/
summarize/
v1.md
v2.md
extract_fields/
v1.md
schemas/
summarize/
v1.json
v2.json
extract_fields/
v1.json
This makes it obvious:
- which tasks exist,
- which versions are available,
- which schema matches which prompt.
Put stable “house rules” in a shared system.md and keep task prompts focused. This reduces duplication and drift.
Prompt IDs and versions (how to name things)
Pick a naming scheme you can log and search easily. A practical scheme:
- Prompt ID:
summarize,extract_fields,answer_with_sources - Prompt version:
v1,v2, … (or semantic versions if you prefer) - Full identifier:
summarize@v2
Then your app can log: prompt_id=summarize, prompt_version=v2, schema_version=v2.
If you changed behavior, bump the version. If you only fixed formatting with no behavior change, you may not need a bump—but be honest.
Changing prompts safely (review + verification)
Prompt changes should follow a mini engineering workflow:
- Write acceptance criteria: what must remain true?
- Update prompt version: create
v2rather than editingv1in place (safer early on). - Update schema if needed: keep schema/prompt aligned.
- Run a small eval set: 10–25 examples that matter (even manual early).
- Review diffs: prompt diffs are behavior diffs.
- Roll out deliberately: use a config flag to choose
v1vsv2.
If you overwrite the prompt that produced yesterday’s behavior, you lose the ability to reproduce and debug. Add v2 instead.
Logging prompt versions in your app
Prompt version logging is the difference between “we can debug this” and “we’re guessing.” Log:
- prompt id and version,
- schema version (if structured),
- model name,
- key settings (temperature),
- outcome category and latency.
This connects production behavior back to a specific artifact in git.
Migration and backwards compatibility
Prompt and schema changes can break consumers. Reduce breakage by:
- keeping old versions available for a while,
- introducing new fields as optional before making them required,
- supporting multiple schema versions in the parser/validator temporarily,
- rolling out behind a feature flag (or environment config).
Even small apps benefit from this discipline, because it keeps iteration safe.
Copy-paste templates
Template: prompt file header
# Prompt: summarize@v2
Purpose:
- Summarize an article into structured bullets.
Inputs:
- article_text: string
Output:
- JSON matching schema summarize/v2.json
Constraints:
- No hallucinated facts; stay grounded in provided text
- If text is missing/ambiguous, say so
- Be concise and structured
Template: prompt changelog entry
summarize@v2
- Changed: added explicit grounding rules
- Changed: tightened schema (required fields)
- Added: handling for empty input
- Notes: expect fewer hallucinations; slightly more refusals on ambiguous inputs
Template: prompt version config
Config:
- PROMPT_SUMMARIZE_VERSION=v2
- SCHEMA_SUMMARIZE_VERSION=v2