Home/
Part XIV — Team Workflows and "Shipping With Adults in the Room"/43. Collaboration Patterns/43.1 Prompt reviews like code reviews
43.1 Prompt reviews like code reviews
Overview and links for this section of the guide.
Why Review Prompts
Prompts are code. They determine system behavior, affect user experience, and can introduce bugs. They deserve the same review rigor as any other code.
// This innocent change broke everything
- "You are a helpful customer support agent."
+ "You are an extremely helpful and eager customer support agent who always says yes."
// Result: Agent started promising refunds we couldn't honor
Review Process
// prompts/customer-support.md
---
version: 2.3.0
author: [email protected]
reviewers: [[email protected], [email protected]]
eval_set: golden-support-v2
last_tested: 2024-01-15
accuracy: 94.2%
---
You are a customer support agent for Acme Corp.
## Rules
- Never promise refunds > $50 without escalation
- Always verify order ID before discussing details
- Escalate any mention of legal action
## Tone
- Professional but warm
- Apologize for issues, don't blame the customer
Review Checklist
## Prompt Review Checklist
### Safety
- [ ] No new capabilities that could be misused
- [ ] Sensitive data handling is appropriate
- [ ] Escalation rules are maintained
- [ ] No prompt injection vulnerabilities
### Quality
- [ ] Clear, unambiguous instructions
- [ ] Examples are representative
- [ ] Edge cases are handled
- [ ] Output format is specified
### Testing
- [ ] Eval suite passes (>= previous accuracy)
- [ ] New test cases added for new behavior
- [ ] Manual spot-check completed
- [ ] No regression on existing capabilities
### Documentation
- [ ] Version number updated
- [ ] Changelog entry added
- [ ] Breaking changes documented
Tooling
// prompt-diff.ts
// Show meaningful diffs for prompt changes
function diffPrompts(oldPrompt: string, newPrompt: string): PromptDiff {
// Structural diff, not just text diff
const oldSections = parsePromptSections(oldPrompt);
const newSections = parsePromptSections(newPrompt);
return {
addedSections: newSections.filter(s => !oldSections.find(o => o.name === s.name)),
removedSections: oldSections.filter(s => !newSections.find(n => n.name === s.name)),
modifiedSections: findModifiedSections(oldSections, newSections),
ruleChanges: diffRules(oldPrompt, newPrompt),
toneChanges: detectToneShift(oldPrompt, newPrompt)
};
}
// In CI: fail if accuracy drops
async function promptCICheck(pr: PullRequest): Promise {
const changedPrompts = pr.files.filter(f => f.path.startsWith('prompts/'));
for (const prompt of changedPrompts) {
const baseline = await getBaseline(prompt.path);
const newResults = await runEvalSuite(prompt.content);
if (newResults.accuracy < baseline.accuracy - 0.02) {
return { passed: false, reason: `Accuracy dropped: ${baseline.accuracy} → ${newResults.accuracy}` };
}
}
return { passed: true };
}