3.3 Common workflows: chat, structured output, tool calling, multimodal

On this page

Why “workflow selection” matters
Workflow: Chat (planning, reasoning, iteration)
Workflow: Structured output (schemas, contracts)
Workflow: Tool calling (grounding, doing real work)
Workflow: Multimodal (images/docs/audio)
Hybrid workflows (how to combine them)
How to choose the right workflow quickly
Common pitfalls (and fixes)
Copy-paste templates
Where to go next

Why “workflow selection” matters

AI Studio offers multiple ways to interact with models: chat, structured output, tool calling, and (sometimes) multimodal. These are not cosmetic UI differences. They change how you should write prompts, how you verify results, and what failure modes you’re likely to hit.

Most “AI is inconsistent” problems are actually “wrong workflow for the job” problems.

A simple rule

Use chat to think, structured output to contract, and tools to verify. Use multimodal when text isn’t the whole input.

Workflow: Chat (planning, reasoning, iteration)

Chat is best for: planning, tradeoffs, debugging hypotheses, and iterative refinement. It’s your “conversation workspace.”

When chat is the right tool

You’re exploring approaches (“Give me 3 options with tradeoffs”).
You’re asking for a plan before implementation.
You’re debugging with logs/errors and need hypotheses.
You’re doing a repo tour and building a mental model.
You want an explanation of an existing system.

How to use chat well

Constrain outputs: “Plan in 5 steps,” “diff-only,” “ask questions first.”
Keep the working set small: provide only the relevant file snippets and evidence.
Require verification steps: every recommendation should come with “what to check.”
Reset when needed: long chat history causes drift; summarize and restart.

Common failure modes in chat

Vibes instead of constraints: output looks helpful but misses requirements.
Hallucinated facts: invented APIs/UI labels when context is missing.
Over-long answers: too much explanation, not enough actionable diff.

Chat is for narrowing, not for shipping

Use chat to converge on the next small action. Then export and verify in code.

Workflow: Structured output (schemas, contracts)

Structured output is best when your application needs machine-readable results. This is where you stop “parsing vibes” and start enforcing contracts.

When structured output is the right tool

Your app will parse the model’s response.
You need reliable fields (enums, required keys, nested objects).
You want predictable formatting across many runs.
You want validation and clear error handling when outputs are invalid.

How to use structured output well

Define a schema: make fields explicit, prefer enums where possible.
Keep it tight: fewer degrees of freedom = fewer failures.
Provide examples: one correct output example anchors behavior.
Use conservative sampling: lower randomness improves schema compliance.
Validate every response: treat invalid JSON as a normal failure mode.

Common failure modes in structured output

Partial/invalid JSON: truncation, missing braces, wrong types.
Schema drift: extra fields, missing required fields, wrong enum values.
Hidden assumptions: model fills fields with guessed facts.

Structured output is a quality lever

Once your outputs are structured, you can build tests and evals around them. This is how you make quality measurable over time.

Workflow: Tool calling (grounding, doing real work)

Tool calling is how you separate “thinking” from “doing.” Instead of trusting the model’s claims, you give it tools to check reality or perform controlled actions.

When tool calling is the right tool

You need grounded answers (“use these sources/files only”).
You want the model to run checks (tests, linters) and report exact outputs.
You’re building an app that must interact with external systems safely.
You want to reduce hallucinations by forcing evidence.

How to use tools safely

Least privilege: tools should do the minimum necessary.
Budgets/timeouts: prevent runaway loops and expensive calls.
Structured tool I/O: clear JSON inputs/outputs reduce mistakes.
Human approvals: require confirmation for destructive actions.
Auditability: log tool calls and their results (without leaking secrets).

Common failure modes in tool calling

Tool misuse: wrong parameters, wrong assumptions about tool behavior.
Runaway loops: repeated calls without convergence.
Fragile parsing: tool returns text; model misreads it.
Security risks: tools that can do too much damage.

Tools make models more powerful—and more dangerous

Tool calling is where “AI becomes an app.” Treat it like production engineering: guardrails, budgets, approvals, and audits.

Workflow: Multimodal (images/docs/audio)

Multimodal workflows matter when text isn’t enough: UI screenshots, diagrams, PDFs, or recordings. They can unlock powerful debugging and extraction workflows, but they add privacy and safety concerns.

When multimodal is the right tool

Debugging UI layout issues from screenshots.
Extracting structured information from documents/images.
Summarizing meetings or audio notes into actions.
Generating test cases from visual artifacts.

How to use multimodal safely

Minimize sensitive data: redact user info, secrets, and internal identifiers.
Ask for uncertainty: “If you can’t read a detail, say so.”
Prefer structured extraction: output JSON with confidence/unknowns.
Don’t over-trust screenshots: verify behavior in code; images can mislead.

Multimodal is powerful but fragile

Small visual ambiguities can create confident mistakes. Treat visual extraction as a hypothesis that you validate.

Hybrid workflows (how to combine them)

Real projects often combine workflows. A common, effective pattern:

Chat to plan and identify risks.
Structured output to define the contract.
Tools to verify and ground behavior.
Repo to implement, test, and ship.

Example hybrid flow

Chat: “Propose 3 schema designs and tradeoffs.”
Structured: “Output JSON matching schema v2 only.”
Tools: “Validate schema, run tests, show exact failures.”
Repo: “Commit prompt + schema + validator + smoke tests.”

Switch workflows intentionally

If you feel stuck, you’re often in the wrong workflow. Move from chat → structure → tools as the need for correctness increases.

How to choose the right workflow quickly

Use this quick decision table:

Need options/tradeoffs? Chat.
Need machine-readable output? Structured output.
Need grounded truth or real actions? Tools.
Need images/docs/audio? Multimodal.

And remember the “risk rule” from Part I: as blast radius increases, move toward structured outputs, tools, and verification gates.

Common pitfalls (and fixes)

Pitfall: using chat for JSON contracts

Fix: switch to structured output and validate results.

Pitfall: using structured output for brainstorming

Fix: brainstorm in chat with multiple candidates; then lock a schema.

Pitfall: giving tools too much power

Fix: least privilege, budgets, approvals, and audit logs.

Pitfall: leaking sensitive info via images/docs

Fix: redact, minimize, and treat all uploads as potentially sensitive.

The meta-pitfall

If you don’t define what “done” means (and how you’ll verify), any workflow will feel inconsistent.

Copy-paste templates

Template: chat planning prompt

Goal:
...

Constraints:
- ...

Ask:
1) Propose 3 approaches with tradeoffs.
2) Recommend one and explain why.
3) List risks and verification steps.

Template: structured output prompt

Task:
...

Output requirements:
- Output JSON only
- Must match this schema:
{ ... }

Examples:
- Input: ...
  Output: ...

Template: tool-grounded prompt

Use tools to verify before claiming success.
If a tool fails, report the exact error and propose the smallest next step.
Never guess file contents or command outputs.

Template: multimodal extraction prompt

Given this image/document:
1) Extract the requested fields as JSON.
2) For each field, include confidence (low/medium/high).
3) If unreadable/uncertain, set value to null and explain why.

3.3 Common workflows: chat, structured output, tool calling, multimodal

Why “workflow selection” matters

Workflow: Chat (planning, reasoning, iteration)

When chat is the right tool

How to use chat well

Common failure modes in chat

Workflow: Structured output (schemas, contracts)

When structured output is the right tool

How to use structured output well

Common failure modes in structured output

Workflow: Tool calling (grounding, doing real work)

When tool calling is the right tool

How to use tools safely

Common failure modes in tool calling

Workflow: Multimodal (images/docs/audio)

When multimodal is the right tool

How to use multimodal safely

Hybrid workflows (how to combine them)

Example hybrid flow

How to choose the right workflow quickly

Common pitfalls (and fixes)

Pitfall: using chat for JSON contracts

Pitfall: using structured output for brainstorming

Pitfall: giving tools too much power

Pitfall: leaking sensitive info via images/docs

Copy-paste templates

Template: chat planning prompt

Template: structured output prompt

Template: tool-grounded prompt

Template: multimodal extraction prompt

Where to go next