42.4 Evaluation before and after tuning

Overview and links for this section of the guide.

Evaluation Strategy

// Always evaluate BEFORE fine-tuning to establish baseline
const baseline = await evaluateModel(baseModel, testSet);

// Train
const fineTuned = await trainModel(baseModel, trainingData);

// Evaluate after
const afterTuning = await evaluateModel(fineTuned, testSet);

// Compare
console.log('Improvement:', {
  accuracy: afterTuning.accuracy - baseline.accuracy,
  format: afterTuning.formatCompliance - baseline.formatCompliance
});

Key Metrics

  • Task accuracy: Does it get the right answer?
  • Format compliance: Does it match the expected schema?
  • General capability: Did we regress on other tasks?

Where to go next