3.3 Common workflows: chat, structured output, tool calling, multimodal
Overview and links for this section of the guide.
On this page
- Why “workflow selection” matters
- Workflow: Chat (planning, reasoning, iteration)
- Workflow: Structured output (schemas, contracts)
- Workflow: Tool calling (grounding, doing real work)
- Workflow: Multimodal (images/docs/audio)
- Hybrid workflows (how to combine them)
- How to choose the right workflow quickly
- Common pitfalls (and fixes)
- Copy-paste templates
- Where to go next
Why “workflow selection” matters
AI Studio offers multiple ways to interact with models: chat, structured output, tool calling, and (sometimes) multimodal. These are not cosmetic UI differences. They change how you should write prompts, how you verify results, and what failure modes you’re likely to hit.
Most “AI is inconsistent” problems are actually “wrong workflow for the job” problems.
Use chat to think, structured output to contract, and tools to verify. Use multimodal when text isn’t the whole input.
Workflow: Chat (planning, reasoning, iteration)
Chat is best for: planning, tradeoffs, debugging hypotheses, and iterative refinement. It’s your “conversation workspace.”
When chat is the right tool
- You’re exploring approaches (“Give me 3 options with tradeoffs”).
- You’re asking for a plan before implementation.
- You’re debugging with logs/errors and need hypotheses.
- You’re doing a repo tour and building a mental model.
- You want an explanation of an existing system.
How to use chat well
- Constrain outputs: “Plan in 5 steps,” “diff-only,” “ask questions first.”
- Keep the working set small: provide only the relevant file snippets and evidence.
- Require verification steps: every recommendation should come with “what to check.”
- Reset when needed: long chat history causes drift; summarize and restart.
Common failure modes in chat
- Vibes instead of constraints: output looks helpful but misses requirements.
- Hallucinated facts: invented APIs/UI labels when context is missing.
- Over-long answers: too much explanation, not enough actionable diff.
Use chat to converge on the next small action. Then export and verify in code.
Workflow: Structured output (schemas, contracts)
Structured output is best when your application needs machine-readable results. This is where you stop “parsing vibes” and start enforcing contracts.
When structured output is the right tool
- Your app will parse the model’s response.
- You need reliable fields (enums, required keys, nested objects).
- You want predictable formatting across many runs.
- You want validation and clear error handling when outputs are invalid.
How to use structured output well
- Define a schema: make fields explicit, prefer enums where possible.
- Keep it tight: fewer degrees of freedom = fewer failures.
- Provide examples: one correct output example anchors behavior.
- Use conservative sampling: lower randomness improves schema compliance.
- Validate every response: treat invalid JSON as a normal failure mode.
Common failure modes in structured output
- Partial/invalid JSON: truncation, missing braces, wrong types.
- Schema drift: extra fields, missing required fields, wrong enum values.
- Hidden assumptions: model fills fields with guessed facts.
Once your outputs are structured, you can build tests and evals around them. This is how you make quality measurable over time.
Workflow: Tool calling (grounding, doing real work)
Tool calling is how you separate “thinking” from “doing.” Instead of trusting the model’s claims, you give it tools to check reality or perform controlled actions.
When tool calling is the right tool
- You need grounded answers (“use these sources/files only”).
- You want the model to run checks (tests, linters) and report exact outputs.
- You’re building an app that must interact with external systems safely.
- You want to reduce hallucinations by forcing evidence.
How to use tools safely
- Least privilege: tools should do the minimum necessary.
- Budgets/timeouts: prevent runaway loops and expensive calls.
- Structured tool I/O: clear JSON inputs/outputs reduce mistakes.
- Human approvals: require confirmation for destructive actions.
- Auditability: log tool calls and their results (without leaking secrets).
Common failure modes in tool calling
- Tool misuse: wrong parameters, wrong assumptions about tool behavior.
- Runaway loops: repeated calls without convergence.
- Fragile parsing: tool returns text; model misreads it.
- Security risks: tools that can do too much damage.
Tool calling is where “AI becomes an app.” Treat it like production engineering: guardrails, budgets, approvals, and audits.
Workflow: Multimodal (images/docs/audio)
Multimodal workflows matter when text isn’t enough: UI screenshots, diagrams, PDFs, or recordings. They can unlock powerful debugging and extraction workflows, but they add privacy and safety concerns.
When multimodal is the right tool
- Debugging UI layout issues from screenshots.
- Extracting structured information from documents/images.
- Summarizing meetings or audio notes into actions.
- Generating test cases from visual artifacts.
How to use multimodal safely
- Minimize sensitive data: redact user info, secrets, and internal identifiers.
- Ask for uncertainty: “If you can’t read a detail, say so.”
- Prefer structured extraction: output JSON with confidence/unknowns.
- Don’t over-trust screenshots: verify behavior in code; images can mislead.
Small visual ambiguities can create confident mistakes. Treat visual extraction as a hypothesis that you validate.
Hybrid workflows (how to combine them)
Real projects often combine workflows. A common, effective pattern:
- Chat to plan and identify risks.
- Structured output to define the contract.
- Tools to verify and ground behavior.
- Repo to implement, test, and ship.
Example hybrid flow
- Chat: “Propose 3 schema designs and tradeoffs.”
- Structured: “Output JSON matching schema v2 only.”
- Tools: “Validate schema, run tests, show exact failures.”
- Repo: “Commit prompt + schema + validator + smoke tests.”
If you feel stuck, you’re often in the wrong workflow. Move from chat → structure → tools as the need for correctness increases.
How to choose the right workflow quickly
Use this quick decision table:
- Need options/tradeoffs? Chat.
- Need machine-readable output? Structured output.
- Need grounded truth or real actions? Tools.
- Need images/docs/audio? Multimodal.
And remember the “risk rule” from Part I: as blast radius increases, move toward structured outputs, tools, and verification gates.
Common pitfalls (and fixes)
Pitfall: using chat for JSON contracts
Fix: switch to structured output and validate results.
Pitfall: using structured output for brainstorming
Fix: brainstorm in chat with multiple candidates; then lock a schema.
Pitfall: giving tools too much power
Fix: least privilege, budgets, approvals, and audit logs.
Pitfall: leaking sensitive info via images/docs
Fix: redact, minimize, and treat all uploads as potentially sensitive.
If you don’t define what “done” means (and how you’ll verify), any workflow will feel inconsistent.
Copy-paste templates
Template: chat planning prompt
Goal:
...
Constraints:
- ...
Ask:
1) Propose 3 approaches with tradeoffs.
2) Recommend one and explain why.
3) List risks and verification steps.
Template: structured output prompt
Task:
...
Output requirements:
- Output JSON only
- Must match this schema:
{ ... }
Examples:
- Input: ...
Output: ...
Template: tool-grounded prompt
Use tools to verify before claiming success.
If a tool fails, report the exact error and propose the smallest next step.
Never guess file contents or command outputs.
Template: multimodal extraction prompt
Given this image/document:
1) Extract the requested fields as JSON.
2) For each field, include confidence (low/medium/high).
3) If unreadable/uncertain, set value to null and explain why.