38.3 Test-driven agent behavior

On this page

The Concept
TDD Workflow
Implementation
Handling Test Failures
Where to go next

The Concept

Test-driven agent behavior means: the agent writes tests first, then code to pass them. Tests become the specification.

Traditional: Requirement → Code → (Maybe) Tests
TDD Agent:   Requirement → Tests → Code → Verify

Why this works:

Tests force the agent to think about edge cases upfront
Passing tests prove the code works
Tests serve as documentation of intent
Regression protection is built in

TDD Workflow

┌─────────────────────────────────────────────────────────────────┐
│                   TDD AGENT WORKFLOW                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. UNDERSTAND REQUIREMENT                                       │
│     └─▶ Parse issue/task into testable behaviors                │
│                                                                  │
│  2. GENERATE TESTS                                               │
│     └─▶ Write failing tests that define success                 │
│                                                                  │
│  3. RUN TESTS (expect failures)                                 │
│     └─▶ Verify tests fail for the right reasons                 │
│                                                                  │
│  4. GENERATE CODE                                                │
│     └─▶ Write minimal code to pass tests                        │
│                                                                  │
│  5. RUN TESTS (expect passes)                                   │
│     └─▶ All tests should pass                                   │
│                                                                  │
│  6. REFACTOR (optional)                                         │
│     └─▶ Clean up while keeping tests green                      │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Implementation

// tdd-agent.ts
const TEST_GENERATION_PROMPT = `You are a test-first development agent.
Given a requirement, write comprehensive tests BEFORE any implementation.

Rules:
1. Write tests that will FAIL until the feature is implemented
2. Cover happy path, edge cases, and error cases
3. Tests should be independent and not rely on order
4. Use descriptive test names that explain the expected behavior

Output tests in this format for the project's test framework.`;

export class TDDAgent {
  async implement(requirement: string): Promise {
    // Step 1: Generate tests first
    const tests = await this.generateTests(requirement);
    await this.writeTests(tests);
    
    // Step 2: Run tests - should fail
    const initialRun = await this.runTests();
    if (initialRun.passed === initialRun.total) {
      return {
        status: 'skip',
        reason: 'Tests already pass - feature may already exist'
      };
    }
    
    // Step 3: Generate implementation
    const code = await this.generateImplementation(requirement, tests);
    await this.writeCode(code);
    
    // Step 4: Run tests - should pass
    const finalRun = await this.runTests();
    
    if (finalRun.passed !== finalRun.total) {
      // Try to fix
      const fix = await this.fixFailingTests(finalRun.failures, code);
      await this.writeCode(fix);
      
      const retryRun = await this.runTests();
      return {
        status: retryRun.passed === retryRun.total ? 'success' : 'partial',
        tests: tests,
        code: fix,
        testResults: retryRun
      };
    }
    
    return {
      status: 'success',
      tests,
      code,
      testResults: finalRun
    };
  }
  
  private async generateTests(requirement: string): Promise {
    const response = await this.model.generateContent({
      systemInstruction: TEST_GENERATION_PROMPT,
      contents: [{
        role: 'user',
        parts: [{ text: `Requirement: ${requirement}` }]
      }]
    });
    
    return this.extractCode(response.response.text());
  }
  
  private async generateImplementation(
    requirement: string, 
    tests: string
  ): Promise {
    const implPrompt = `You are implementing a feature to pass these tests.
    
Tests:
\`\`\`
${tests}
\`\`\`

Requirement: ${requirement}

Write the minimal code to make ALL tests pass.`;
    
    const response = await this.model.generateContent({
      contents: [{ role: 'user', parts: [{ text: implPrompt }] }]
    });
    
    return this.extractCode(response.response.text());
  }
}

Handling Test Failures

// test-failure-handler.ts
export class TestFailureHandler {
  async diagnoseAndFix(
    failures: TestFailure[],
    currentCode: string
  ): Promise {
    const diagnosisPrompt = `These tests are failing.
    
Failures:
${failures.map(f => `
Test: ${f.testName}
Expected: ${f.expected}
Actual: ${f.actual}
Error: ${f.error}
`).join('\n')}

Current implementation:
\`\`\`
${currentCode}
\`\`\`

Diagnose the issue and fix the code.`;

    const response = await this.model.generateContent({
      contents: [{ role: 'user', parts: [{ text: diagnosisPrompt }] }]
    });
    
    return this.extractCode(response.response.text());
  }
  
  // Limit retry attempts
  async fixWithRetries(
    failures: TestFailure[],
    currentCode: string,
    maxRetries = 3
  ): Promise<{ code: string; success: boolean }> {
    let code = currentCode;
    
    for (let i = 0; i < maxRetries; i++) {
      code = await this.diagnoseAndFix(failures, code);
      const result = await this.runTests();
      
      if (result.passed === result.total) {
        return { code, success: true };
      }
      
      failures = result.failures;
    }
    
    return { code, success: false };
  }
}

Tests as Guardrails

Don't let the agent merge code if tests fail. The test suite is your automated reviewer. If it can't pass the tests, it needs human intervention.

Where to go next

38.4 Static analysis and lint gates