36.5 Guardrails: avoid confident math errors

On this page

The LLM Math Problem
The Code Solution
Output Verification
Implementation
Common Math Errors to Catch
Where to go next

The LLM Math Problem

If you ask an LLM "What is 1234 * 5678?", it might get the answer wrong. LLMs are language models, not calculators. They predict tokens, not compute arithmetic.

Common failure modes:

Wrong last digit in multiplication
Incorrect carry operations
Rounding errors mentioned incorrectly
Percentage calculations off by factors

The Code Solution

Never let the LLM do the math. Instead:

LLM writes code that does the math
Python executes the code (100% accurate)
Display the computed result

// Good vs Bad approach

// ❌ BAD: Asking LLM to calculate
Prompt: "Calculate the sum of the sales column."
LLM: "The sum is approximately $1,234,567."  // Might be wrong!

// ✅ GOOD: LLM writes code
Prompt: "Write code to calculate the sum of the sales column."
LLM: "df['sales'].sum()"
Python: 1234567.89  // Exactly correct

The Golden Rule of Analytics

Code is the calculator. The model is the operator. The model tells the code what to do; it never does the math itself.

Output Verification

Even with code execution, verify outputs make sense:

// output-verification.ts
interface VerificationResult {
  passed: boolean;
  warnings: string[];
  errors: string[];
}

export function verifyNumericOutput(
  result: any,
  context: { columnName?: string; expectedRange?: [number, number]; rowCount?: number }
): VerificationResult {
  const warnings: string[] = [];
  const errors: string[] = [];
  
  // Check for common impossible values
  if (typeof result === 'number') {
    if (isNaN(result)) {
      errors.push('Result is NaN - calculation may have divided by zero');
    }
    if (!isFinite(result)) {
      errors.push('Result is infinite - check for overflow');
    }
    if (result < 0 && context.columnName?.toLowerCase().includes('count')) {
      errors.push('Negative count - this should not be possible');
    }
  }
  
  // Check aggregation sanity
  if (Array.isArray(result)) {
    const numericValues = result.filter(r => typeof r === 'number' && isFinite(r));
    const max = Math.max(...numericValues);
    const min = Math.min(...numericValues);
    
    if (max / min > 1000000) {
      warnings.push('Very large range in results - verify this is expected');
    }
  }
  
  // Check against expected range
  if (context.expectedRange) {
    const [min, max] = context.expectedRange;
    if (result < min || result > max) {
      warnings.push(`Result ${result} outside expected range [${min}, ${max}]`);
    }
  }
  
  return {
    passed: errors.length === 0,
    warnings,
    errors
  };
}

Implementation

// math-guardrails.ts
export class MathGuardrails {
  // Ensure the LLM never attempts direct calculation
  sanitizePrompt(question: string): string {
    const calculationPatterns = [
      /calculate\s+the\s+(sum|average|mean|total)/gi,
      /what\s+is\s+\d+\s*[\+\-\*\/]\s*\d+/gi,
      /how\s+much\s+is/gi,
    ];
    
    let sanitized = question;
    for (const pattern of calculationPatterns) {
      if (pattern.test(question)) {
        // Rephrase to code generation
        sanitized = question.replace(pattern, 'write code to calculate the $1');
      }
    }
    
    return sanitized;
  }
  
  // Add verification step after execution
  async analyzeWithVerification(
    analyst: DataAnalyst,
    schema: string,
    question: string,
    dataPath: string
  ): Promise {
    // Generate and execute
    const result = await analyst.analyze(schema, question, dataPath);
    
    // Verify the output
    const verification = verifyNumericOutput(result.output.data, {
      columnName: this.extractTargetColumn(question),
    });
    
    if (!verification.passed) {
      throw new VerificationError(verification.errors.join('; '));
    }
    
    return {
      ...result,
      verification
    };
  }
  
  // Double-check percentages
  verifyPercentage(value: number, description: string): void {
    if (value < 0) {
      throw new Error(`${description} is negative (${value}%) - check calculation`);
    }
    if (value > 100 && !description.includes('growth')) {
      throw new Error(`${description} exceeds 100% (${value}%) - might need normalization`);
    }
  }
}

Common Math Errors to Catch

Error Type	Example	Detection
Division by zero	NaN or Infinity result	`isNaN()` or `!isFinite()`
Percentage > 100%	Market share = 150%	Range check
Negative counts	-5 orders	Sign check on count columns
Future dates	Data from 2030	Date range validation
Null aggregation	sum() on all nulls	Check for None/NaN result

// Sanity checks for common analytics outputs
function sanityCheckAnalytics(
  query: string,
  result: any
): string[] {
  const warnings: string[] = [];
  
  // Revenue/sales should be positive
  if (query.match(/revenue|sales|income/i) && result < 0) {
    warnings.push('Negative revenue detected - verify this is intentional');
  }
  
  // Counts should be integers
  if (query.match(/count|number of|how many/i) && !Number.isInteger(result)) {
    warnings.push('Non-integer count - rounding may have occurred');
  }
  
  // Rates should be between 0 and 1 (or 0 and 100 for percentages)
  if (query.match(/rate|ratio|percentage/i)) {
    if (result > 100 || result < 0) {
      warnings.push('Rate/percentage outside normal range');
    }
  }
  
  return warnings;
}

Always Show the Code

When displaying results, always show the calculation code. If users see df['revenue'].sum(), they can verify it themselves. If they just see "$1.2M", they have to trust blindly.

Where to go next

Part XIII — Expert Mode: Systems, Agents, and Automation