Home/
Part XIII — Expert Mode: Systems, Agents, and Automation/42. Fine-Tuning vs Prompting vs Retrieval (Decision Framework)/42.4 Evaluation before and after tuning
42.4 Evaluation before and after tuning
Overview and links for this section of the guide.
On this page
The Baseline
Before you tune, measure the base model (Gemini Pro) on your task. It might already be 90% good. Fine-tuning might get you to 92%, or it might drop you to 80% (catastrophic forgetting).
The Holdout Set
Keep 20% of your data secret. Do not train on it. Evaluate the tuned model on this set.
If the model memorizes the training data but fails the test data, it is overfitting. It's useless for new inputs.