31.3 Least-privilege tool design
Overview and links for this section of the guide.
On this page
Goal: make tools safe even if the model is tricked
Assume the model will eventually be tricked, confused, or manipulated.
Least-privilege tool design ensures that even if that happens:
- the blast radius is small,
- high-risk actions are blocked or require approval,
- permissions are enforced server-side,
- you can audit what happened.
The system prompt is not a security boundary. Tool permissions, parameter validation, and server-side enforcement are.
Least privilege principles (for tool calling)
- Small tools: one tool does one narrow job.
- Separate read vs write: read-only tools are safer and easier to allow broadly.
- Default deny: tools are unavailable unless explicitly enabled for a mode.
- Server-side checks: permissions and allowlists enforced in code.
- Minimal data: tool outputs should not include raw sensitive records unless necessary.
- Explicit user intent: don’t let tools infer intent from vague prompts.
Tool interface design rules
Design tools like hardened APIs:
- Typed inputs: strict schema (no “freeform query” fields when avoidable).
- Constrained enums: allowed actions and resource types are enumerated.
- Safe defaults: default to “read” and “small scope.”
- Explicit scoping: require resource ids; avoid broad queries.
- No hidden powers: tool should not do “extra helpful” actions.
A common bad tool design smells like:
- “execute_sql(sql: string)”
- “run_command(cmd: string)”
- “fetch_url(url: string)” without allowlists
If you must have powerful tools, put them behind approvals and narrow allowlists.
Scoping and permissions
Scope every tool call:
- User scope: tool actions run as the user (or a scoped service account), not as “admin.”
- Tenant scope: enforce tenant boundaries; include tenant id as a required param.
- Resource scope: require explicit resource ids and validate ownership/permissions.
Never trust the model to supply “the right tenant.” The app should derive it from the authenticated session.
Write operations: approvals, idempotency, reversibility
Write tools are dangerous. Make them safe:
- Require approvals: human-in-the-loop for high-impact actions.
- Idempotency keys: prevent duplicate writes on retries.
- Reversibility: prefer operations you can undo (or at least audit and repair).
- Dry-run mode: tool can return what it would do without doing it.
For many products, the safest pattern is: model proposes changes; humans apply. You still get speed without giving the model a loaded gun.
Tool responses: return minimal data
Tool outputs can leak sensitive info. Safer response patterns:
- return ids and summaries, not full records,
- return counts and aggregates, not raw rows,
- return “safe views” of data with sensitive fields removed,
- require explicit permission to return raw values.
This prevents “tool read → model echo” leaks.
Budgets and abuse prevention
Even safe tools can be abused if unlimited:
- max tool calls per request
- max data size per tool response
- rate limits per user/tenant
- circuit breakers for tool failure
Log tool calls with request ids and store enough metadata to audit without leaking sensitive payloads.