36.5 Guardrails: avoid confident math errors

Overview and links for this section of the guide.

The LLM Math Problem

If you ask an LLM "What is 1234 * 5678?", it might get the answer wrong. LLMs are language models, not calculators. They predict tokens, not compute arithmetic.

Common failure modes:

  • Wrong last digit in multiplication
  • Incorrect carry operations
  • Rounding errors mentioned incorrectly
  • Percentage calculations off by factors

The Code Solution

Never let the LLM do the math. Instead:

  1. LLM writes code that does the math
  2. Python executes the code (100% accurate)
  3. Display the computed result
// Good vs Bad approach

// ❌ BAD: Asking LLM to calculate
Prompt: "Calculate the sum of the sales column."
LLM: "The sum is approximately $1,234,567."  // Might be wrong!

// ✅ GOOD: LLM writes code
Prompt: "Write code to calculate the sum of the sales column."
LLM: "df['sales'].sum()"
Python: 1234567.89  // Exactly correct
The Golden Rule of Analytics

Code is the calculator. The model is the operator. The model tells the code what to do; it never does the math itself.

Output Verification

Even with code execution, verify outputs make sense:

// output-verification.ts
interface VerificationResult {
  passed: boolean;
  warnings: string[];
  errors: string[];
}

export function verifyNumericOutput(
  result: any,
  context: { columnName?: string; expectedRange?: [number, number]; rowCount?: number }
): VerificationResult {
  const warnings: string[] = [];
  const errors: string[] = [];
  
  // Check for common impossible values
  if (typeof result === 'number') {
    if (isNaN(result)) {
      errors.push('Result is NaN - calculation may have divided by zero');
    }
    if (!isFinite(result)) {
      errors.push('Result is infinite - check for overflow');
    }
    if (result < 0 && context.columnName?.toLowerCase().includes('count')) {
      errors.push('Negative count - this should not be possible');
    }
  }
  
  // Check aggregation sanity
  if (Array.isArray(result)) {
    const numericValues = result.filter(r => typeof r === 'number' && isFinite(r));
    const max = Math.max(...numericValues);
    const min = Math.min(...numericValues);
    
    if (max / min > 1000000) {
      warnings.push('Very large range in results - verify this is expected');
    }
  }
  
  // Check against expected range
  if (context.expectedRange) {
    const [min, max] = context.expectedRange;
    if (result < min || result > max) {
      warnings.push(`Result ${result} outside expected range [${min}, ${max}]`);
    }
  }
  
  return {
    passed: errors.length === 0,
    warnings,
    errors
  };
}

Implementation

// math-guardrails.ts
export class MathGuardrails {
  // Ensure the LLM never attempts direct calculation
  sanitizePrompt(question: string): string {
    const calculationPatterns = [
      /calculate\s+the\s+(sum|average|mean|total)/gi,
      /what\s+is\s+\d+\s*[\+\-\*\/]\s*\d+/gi,
      /how\s+much\s+is/gi,
    ];
    
    let sanitized = question;
    for (const pattern of calculationPatterns) {
      if (pattern.test(question)) {
        // Rephrase to code generation
        sanitized = question.replace(pattern, 'write code to calculate the $1');
      }
    }
    
    return sanitized;
  }
  
  // Add verification step after execution
  async analyzeWithVerification(
    analyst: DataAnalyst,
    schema: string,
    question: string,
    dataPath: string
  ): Promise {
    // Generate and execute
    const result = await analyst.analyze(schema, question, dataPath);
    
    // Verify the output
    const verification = verifyNumericOutput(result.output.data, {
      columnName: this.extractTargetColumn(question),
    });
    
    if (!verification.passed) {
      throw new VerificationError(verification.errors.join('; '));
    }
    
    return {
      ...result,
      verification
    };
  }
  
  // Double-check percentages
  verifyPercentage(value: number, description: string): void {
    if (value < 0) {
      throw new Error(`${description} is negative (${value}%) - check calculation`);
    }
    if (value > 100 && !description.includes('growth')) {
      throw new Error(`${description} exceeds 100% (${value}%) - might need normalization`);
    }
  }
}

Common Math Errors to Catch

Error Type Example Detection
Division by zero NaN or Infinity result isNaN() or !isFinite()
Percentage > 100% Market share = 150% Range check
Negative counts -5 orders Sign check on count columns
Future dates Data from 2030 Date range validation
Null aggregation sum() on all nulls Check for None/NaN result
// Sanity checks for common analytics outputs
function sanityCheckAnalytics(
  query: string,
  result: any
): string[] {
  const warnings: string[] = [];
  
  // Revenue/sales should be positive
  if (query.match(/revenue|sales|income/i) && result < 0) {
    warnings.push('Negative revenue detected - verify this is intentional');
  }
  
  // Counts should be integers
  if (query.match(/count|number of|how many/i) && !Number.isInteger(result)) {
    warnings.push('Non-integer count - rounding may have occurred');
  }
  
  // Rates should be between 0 and 1 (or 0 and 100 for percentages)
  if (query.match(/rate|ratio|percentage/i)) {
    if (result > 100 || result < 0) {
      warnings.push('Rate/percentage outside normal range');
    }
  }
  
  return warnings;
}
Always Show the Code

When displaying results, always show the calculation code. If users see df['revenue'].sum(), they can verify it themselves. If they just see "$1.2M", they have to trust blindly.

Where to go next