31.3 Least-privilege tool design

Overview and links for this section of the guide.

Goal: make tools safe even if the model is tricked

Assume the model will eventually be tricked, confused, or manipulated.

Least-privilege tool design ensures that even if that happens:

  • the blast radius is small,
  • high-risk actions are blocked or require approval,
  • permissions are enforced server-side,
  • you can audit what happened.
Tools are your real boundary

The system prompt is not a security boundary. Tool permissions, parameter validation, and server-side enforcement are.

Least privilege principles (for tool calling)

  • Small tools: one tool does one narrow job.
  • Separate read vs write: read-only tools are safer and easier to allow broadly.
  • Default deny: tools are unavailable unless explicitly enabled for a mode.
  • Server-side checks: permissions and allowlists enforced in code.
  • Minimal data: tool outputs should not include raw sensitive records unless necessary.
  • Explicit user intent: don’t let tools infer intent from vague prompts.

Tool interface design rules

Design tools like hardened APIs:

  • Typed inputs: strict schema (no “freeform query” fields when avoidable).
  • Constrained enums: allowed actions and resource types are enumerated.
  • Safe defaults: default to “read” and “small scope.”
  • Explicit scoping: require resource ids; avoid broad queries.
  • No hidden powers: tool should not do “extra helpful” actions.

A common bad tool design smells like:

  • “execute_sql(sql: string)”
  • “run_command(cmd: string)”
  • “fetch_url(url: string)” without allowlists

If you must have powerful tools, put them behind approvals and narrow allowlists.

Scoping and permissions

Scope every tool call:

  • User scope: tool actions run as the user (or a scoped service account), not as “admin.”
  • Tenant scope: enforce tenant boundaries; include tenant id as a required param.
  • Resource scope: require explicit resource ids and validate ownership/permissions.

Never trust the model to supply “the right tenant.” The app should derive it from the authenticated session.

Write operations: approvals, idempotency, reversibility

Write tools are dangerous. Make them safe:

  • Require approvals: human-in-the-loop for high-impact actions.
  • Idempotency keys: prevent duplicate writes on retries.
  • Reversibility: prefer operations you can undo (or at least audit and repair).
  • Dry-run mode: tool can return what it would do without doing it.
Proposal-only is a great default

For many products, the safest pattern is: model proposes changes; humans apply. You still get speed without giving the model a loaded gun.

Tool responses: return minimal data

Tool outputs can leak sensitive info. Safer response patterns:

  • return ids and summaries, not full records,
  • return counts and aggregates, not raw rows,
  • return “safe views” of data with sensitive fields removed,
  • require explicit permission to return raw values.

This prevents “tool read → model echo” leaks.

Budgets and abuse prevention

Even safe tools can be abused if unlimited:

  • max tool calls per request
  • max data size per tool response
  • rate limits per user/tenant
  • circuit breakers for tool failure

Log tool calls with request ids and store enough metadata to audit without leaking sensitive payloads.

Where to go next