Home/
Part XIV — Team Workflows and "Shipping With Adults in the Room"/43. Collaboration Patterns/43.3 "Spec → prompt → eval → ship" as a team pipeline
43.3 "Spec → prompt → eval → ship" as a team pipeline
Overview and links for this section of the guide.
On this page
The Pipeline
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│ SPEC │───▶│ PROMPT │───▶│ EVAL │───▶│ SHIP │
└────────┘ └────────┘ └────────┘ └────────┘
│ │ │ │
▼ ▼ ▼ ▼
Define Write Test Deploy
requirements the prompt against to prod
+ test cases golden set
Spec Phase
// spec-template.md
# Feature: Order Status Lookup
## User Story
As a customer, I want to ask about my order status in natural language.
## Test Cases (minimum 20)
| Input | Expected Output | Category |
|-------|-----------------|----------|
| "Where's my order #12345?" | Look up #12345, return status | Happy path |
| "I ordered last week" | Ask for order ID | Clarification |
| "Cancel my order" | Explain can't cancel, provide link | Out of scope |
| "You guys suck" | Empathetic response, escalation offer | Negative |
## Quality Bar
- Accuracy: >= 90% on test cases
- Latency: < 2s p95
- Safety: 0 PII leaks, 0 false promises
Eval Phase
// eval-runner.ts
async function runEval(promptPath: string, evalSetPath: string): Promise {
const prompt = loadPrompt(promptPath);
const evalSet = loadEvalSet(evalSetPath);
const results = await Promise.all(
evalSet.cases.map(async (testCase) => {
const response = await model.generate(prompt.replace('{{input}}', testCase.input));
const passed = await judge(response, testCase.expected);
return { testCase, response, passed };
})
);
const accuracy = results.filter(r => r.passed).length / results.length;
// Block if below bar
if (accuracy < evalSet.qualityBar) {
throw new Error(`Accuracy ${accuracy} below bar ${evalSet.qualityBar}`);
}
return { accuracy, results };
}
Ship Phase
// CI/CD integration
jobs:
prompt-validation:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Detect prompt changes
id: changes
run: |
echo "prompts=$(git diff --name-only HEAD~1 | grep 'prompts/' | tr '\n' ' ')" >> $GITHUB_OUTPUT
- name: Run evals
if: steps.changes.outputs.prompts != ''
run: npm run eval -- ${{ steps.changes.outputs.prompts }}
- name: Deploy if passed
if: success()
run: npm run deploy-prompts