16.2 Designing tool interfaces (inputs/outputs) cleanly

Overview and links for this section of the guide.

Goal: tools that are reliable and safe

A good tool interface makes the “right thing” easy and the “wrong thing” hard.

In practice, that means:

  • inputs are explicit and validated,
  • outputs are structured and stable,
  • errors are categorized and predictable,
  • side effects are gated and idempotent,
  • sensitive data is minimized and protected.
Tool design is API design

Tools are internal APIs that an LLM calls. Treat them like public APIs: version them, validate them, and keep them small.

Tool interface principles

  • Small surface area: fewer tools and fewer parameters per tool.
  • Explicit names: get_order_by_id beats get_order if ambiguity exists.
  • Typed inputs: schema-validated args; avoid free-form “query” strings where possible.
  • Structured outputs: stable JSON objects, not prose.
  • Deterministic where possible: tools should return the same output for the same input.
  • Clear errors: categorize failures so retries are safe.

Input design (make it hard to misuse)

Common input design patterns:

  • Use IDs, not names: “order_id=123” is safer than “order=John’s last order.”
  • Bounded strings: limit lengths; forbid newlines if not needed.
  • Enums: restrict modes to known options.
  • Allowlists: only allow known fields/sorts/filters.
  • Validation: reject invalid inputs before calling external systems.
Avoid “raw query” tools early

A tool like sql(query: string) is extremely dangerous. Prefer narrow tools like get_order_by_id with validated fields.

Output design (make it easy to use)

Tool outputs should be:

  • structured: JSON with stable keys
  • minimal: only the fields the model needs
  • safe: avoid returning sensitive data unless required
  • versioned: include a version id if the shape may evolve

If you return huge blobs, you waste context budget and increase leakage risk.

Error design (machine-readable)

Tools should return errors in a structured way so the model (and your system) can respond correctly.

A practical error envelope:

{
  "ok": false,
  "error": {
    "category": "not_found" | "invalid_input" | "auth" | "rate_limit" | "timeout" | "transient" | "unknown",
    "message": "string",
    "retryable": true | false
  }
}

For success:

{
  "ok": true,
  "data": { ... }
}
Retryability must be explicit

If you want safe retries, the tool should tell you if retrying is safe. Don’t make the model guess.

Idempotency and side effects

Write tools require special care:

  • Idempotency keys: repeated calls should not duplicate actions.
  • Explicit confirmation: require a human-approved “execute” step.
  • Dry-run mode: preview changes before applying.
  • Audit logs: record who/what requested the action and what happened.

A safe pattern is: model proposes a change → human confirms → tool executes.

Security controls (least privilege)

Tool security should include:

  • allowlisted tools: only expose needed tools to the model
  • scoped permissions: tools use credentials with least privilege
  • parameter validation: strict schema validation
  • output filtering: redact sensitive fields before returning to the model
  • budgets: max tool calls per request and per minute
Tools are a data exfiltration surface

If a tool can access sensitive data, the model can be tricked into requesting it. Minimize and redact what tools can return.

Copy-paste tool spec templates

Template: tool spec

Tool name: get_order_by_id

Purpose:
- Fetch order details needed to answer a user question.

Inputs (schema):
- order_id: string (pattern: ^[A-Z0-9-]{6,32}$)

Outputs:
- ok: boolean
- data (if ok): { order_id, status, created_at, items: [...] }
- error (if !ok): { category, message, retryable }

Security:
- Read-only
- Redact PII fields before returning

Notes:
- Do not return payment details or addresses

Template: tool calling contract for the model

Tool calling rules:
- Call tools only when needed to answer the question.
- Never request sensitive data unless explicitly required by the task.
- Treat tool outputs as the source of truth.
- If a tool returns an error, stop and explain what you need from the user.

Tool design checklist

  • Is the tool’s purpose narrow and explicit?
  • Are inputs schema-validated and bounded?
  • Are outputs structured and minimal?
  • Are errors categorized with explicit retryability?
  • Is sensitive data minimized/redacted?
  • For write tools: idempotency + approval + audit logs?
  • Are budgets and allowlists enforced?

Where to go next