30.1 What attackers want from your AI app

On this page

Goal: model realistic attacker objectives
Common attacker types
Attacker objectives (what they want)
Common attack paths in LLM apps
How to prioritize threats
Outputs of this step (artifacts)
Where to go next

Goal: model realistic attacker objectives

Threat modeling starts with a simple question:

If someone wanted to abuse this feature, what would they try to get?

For AI features, attackers typically want one of three things:

data (secrets/PII/proprietary info),
actions (tool execution, account changes, money movement),
disruption (outages, cost spikes, trust erosion).

Keep it product-real

You don’t need to imagine movie-hacker scenarios. Focus on what is feasible given your endpoints, your data, and your permissions.

Common attacker types

Anonymous users: no account, but can submit prompts or uploads.
Normal users: authenticated, trying to access data beyond their scope.
Malicious insiders: employees or contractors misusing access.
Compromised accounts: valid credentials but hostile intent.
Untrusted content: documents, emails, tickets, web pages ingested into RAG.

Attacker objectives (what they want)

1) Data exfiltration

extract secrets from prompts, logs, or tool results,
extract PII from user data or documents,
extract proprietary content from internal docs or code.

2) Unauthorized actions (tool misuse)

trigger a tool that writes data (create users, issue refunds, modify configs),
trigger privileged reads (query internal systems),
trigger “side-effect” actions disguised as harmless requests.

3) Instruction hijacking (prompt injection)

override system rules (“ignore previous instructions”),
cause the model to reveal hidden instructions or policies,
cause the model to follow instructions embedded in documents (indirect injection).

4) Disruption and denial of service

force expensive prompts (token bombs) to increase cost,
force repeated retries and slowdowns,
cause “not found” spam that degrades UX and trust.

Common attack paths in LLM apps

Most real attacks are not clever. They exploit missing guardrails:

Input → prompt injection → tool call: user text manipulates the model into calling a tool with dangerous params.
RAG → indirect injection: a retrieved document contains hostile instructions that influence generation.
Model output → unsafe parser: invalid JSON or unexpected fields crash the app or bypass validation.
Logs → exfiltration: sensitive content lands in logs and becomes accessible broadly.
Over-privileged tools: even benign requests become dangerous because tools can do too much.

The biggest risk is usually permissions

If your tools can access production data broadly, the model becomes a “permissions amplifier.” Least-privilege tool design is the highest leverage defense.

How to prioritize threats

Use a simple rubric:

Impact: how bad if it happens? (data leak, money loss, outage)
Likelihood: how easy is it to attempt? (public endpoint vs internal tool)
Detectability: would you notice quickly? (audit logs vs silent leak)

Prioritize threats that are high impact, easy to attempt, and hard to detect.

Outputs of this step (artifacts)

At the end of this page, you should have:

a list of attacker types relevant to your feature,
a list of top attacker objectives,
a list of the most likely attack paths in your pipeline,
the top 3–5 risks you will design defenses for first.

30.1 What attackers want from your AI app

Goal: model realistic attacker objectives

Common attacker types

Attacker objectives (what they want)

1) Data exfiltration

2) Unauthorized actions (tool misuse)

3) Instruction hijacking (prompt injection)

4) Disruption and denial of service

Common attack paths in LLM apps

How to prioritize threats

Outputs of this step (artifacts)

Where to go next