31.3 Least-privilege tool design

On this page

Goal: make tools safe even if the model is tricked

Assume the model will eventually be tricked, confused, or manipulated.

Least-privilege tool design ensures that even if that happens:

Tools are your real boundary

The system prompt is not a security boundary. Tool permissions, parameter validation, and server-side enforcement are.

Small tools: one tool does one narrow job.
Separate read vs write: read-only tools are safer and easier to allow broadly.
Default deny: tools are unavailable unless explicitly enabled for a mode.
Server-side checks: permissions and allowlists enforced in code.
Minimal data: tool outputs should not include raw sensitive records unless necessary.
Explicit user intent: don’t let tools infer intent from vague prompts.

Design tools like hardened APIs:

A common bad tool design smells like:

If you must have powerful tools, put them behind approvals and narrow allowlists.

Scope every tool call:

User scope: tool actions run as the user (or a scoped service account), not as “admin.”
Tenant scope: enforce tenant boundaries; include tenant id as a required param.
Resource scope: require explicit resource ids and validate ownership/permissions.

Never trust the model to supply “the right tenant.” The app should derive it from the authenticated session.

Write tools are dangerous. Make them safe:

Require approvals: human-in-the-loop for high-impact actions.
Idempotency keys: prevent duplicate writes on retries.
Reversibility: prefer operations you can undo (or at least audit and repair).
Dry-run mode: tool can return what it would do without doing it.

Proposal-only is a great default

For many products, the safest pattern is: model proposes changes; humans apply. You still get speed without giving the model a loaded gun.

Tool outputs can leak sensitive info. Safer response patterns:

This prevents “tool read → model echo” leaks.

Even safe tools can be abused if unlimited:

Log tool calls with request ids and store enough metadata to audit without leaking sensitive payloads.