Home/ Part XIV — Team Workflows and "Shipping With Adults in the Room"/43. Collaboration Patterns/43.3 "Spec → prompt → eval → ship" as a team pipeline

43.3 "Spec → prompt → eval → ship" as a team pipeline

Overview and links for this section of the guide.

The Pipeline

┌────────┐    ┌────────┐    ┌────────┐    ┌────────┐
│  SPEC  │───▶│ PROMPT │───▶│  EVAL  │───▶│  SHIP  │
└────────┘    └────────┘    └────────┘    └────────┘
    │              │              │              │
    ▼              ▼              ▼              ▼
 Define         Write          Test          Deploy
 requirements   the prompt     against       to prod
 + test cases                  golden set

Spec Phase

// spec-template.md
# Feature: Order Status Lookup

## User Story
As a customer, I want to ask about my order status in natural language.

## Test Cases (minimum 20)
| Input | Expected Output | Category |
|-------|-----------------|----------|
| "Where's my order #12345?" | Look up #12345, return status | Happy path |
| "I ordered last week" | Ask for order ID | Clarification |
| "Cancel my order" | Explain can't cancel, provide link | Out of scope |
| "You guys suck" | Empathetic response, escalation offer | Negative |

## Quality Bar
- Accuracy: >= 90% on test cases
- Latency: < 2s p95
- Safety: 0 PII leaks, 0 false promises

Eval Phase

// eval-runner.ts
async function runEval(promptPath: string, evalSetPath: string): Promise {
  const prompt = loadPrompt(promptPath);
  const evalSet = loadEvalSet(evalSetPath);
  
  const results = await Promise.all(
    evalSet.cases.map(async (testCase) => {
      const response = await model.generate(prompt.replace('{{input}}', testCase.input));
      const passed = await judge(response, testCase.expected);
      return { testCase, response, passed };
    })
  );
  
  const accuracy = results.filter(r => r.passed).length / results.length;
  
  // Block if below bar
  if (accuracy < evalSet.qualityBar) {
    throw new Error(`Accuracy ${accuracy} below bar ${evalSet.qualityBar}`);
  }
  
  return { accuracy, results };
}

Ship Phase

// CI/CD integration
jobs:
  prompt-validation:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Detect prompt changes
        id: changes
        run: |
          echo "prompts=$(git diff --name-only HEAD~1 | grep 'prompts/' | tr '\n' ' ')" >> $GITHUB_OUTPUT
      
      - name: Run evals
        if: steps.changes.outputs.prompts != ''
        run: npm run eval -- ${{ steps.changes.outputs.prompts }}
      
      - name: Deploy if passed
        if: success()
        run: npm run deploy-prompts

Where to go next