32.5 Measuring cost per successful task

Overview and links for this section of the guide.

The Only Metric That Matters

Don't just measure "cost per token." Measure Cost Per Successful Task.

If Model A costs \$0.01 per run but fails 50% of the time (requiring user retries), and Model B costs \$0.015 per run but works 99% of the time, Model B is actually cheaper.

$$ \text{True Cost} = \frac{\text{Cost per Request}}{\text{Success Rate}} $$

How to Measure It

  1. Define Success: The code compiles, the JSON parses, or the user accepts the suggestion (didn't edit it).
  2. Log Everything: Record the model used, the tokens consumed, and the outcome (Success/Fail) for every interaction.
  3. Calculate: Look at your logs. "We spent \$50 on 'Generate SQL' queries. We had 100 successful queries. Cost per success = \$0.50."

Improving the Ratio

Once you know your Cost Per Success, you can optimize:

  • Better Prompts: A clearer prompt might increase success rate from 60% to 90%, effectively lowering your cost by 33%.
  • Few-Shot Examples: Adding 3 examples might increase input cost by 10%, but if it prevents the user from hitting "Regenerate" 3 times, you saved money.
  • Self-Correction: Asking the model to "double check code" increases cost by 2x, but if it prevents a 5-minute debugging loop for the human, the total system cost (human + AI) is lower.
The Human Cost

Never forget that developer time is \$50–\$100/hour. Saving \$0.002 on tokens by using a dumb model that wastes 10 minutes of developer time is bad economics.

Where to go next