Home/ Part XIII — Expert Mode: Systems, Agents, and Automation/42. Fine-Tuning vs Prompting vs Retrieval (Decision Framework)/42.4 Evaluation before and after tuning

42.4 Evaluation before and after tuning

Overview and links for this section of the guide.

On this page

Evaluation Strategy
Key Metrics
Where to go next

Evaluation Strategy

// Always evaluate BEFORE fine-tuning to establish baseline
const baseline = await evaluateModel(baseModel, testSet);

// Train
const fineTuned = await trainModel(baseModel, trainingData);

// Evaluate after
const afterTuning = await evaluateModel(fineTuned, testSet);

// Compare
console.log('Improvement:', {
  accuracy: afterTuning.accuracy - baseline.accuracy,
  format: afterTuning.formatCompliance - baseline.formatCompliance
});

Key Metrics

Task accuracy: Does it get the right answer?
Format compliance: Does it match the expected schema?
General capability: Did we regress on other tasks?

Where to go next

42.5 Deployment and rollback