Home/
Part X — Security, Privacy, and Prompt Injection Defense/30. Threat Modeling for AI Features/30.1 What attackers want from your AI app
30.1 What attackers want from your AI app
Overview and links for this section of the guide.
On this page
Goal: model realistic attacker objectives
Threat modeling starts with a simple question:
If someone wanted to abuse this feature, what would they try to get?
For AI features, attackers typically want one of three things:
- data (secrets/PII/proprietary info),
- actions (tool execution, account changes, money movement),
- disruption (outages, cost spikes, trust erosion).
Keep it product-real
You don’t need to imagine movie-hacker scenarios. Focus on what is feasible given your endpoints, your data, and your permissions.
Common attacker types
- Anonymous users: no account, but can submit prompts or uploads.
- Normal users: authenticated, trying to access data beyond their scope.
- Malicious insiders: employees or contractors misusing access.
- Compromised accounts: valid credentials but hostile intent.
- Untrusted content: documents, emails, tickets, web pages ingested into RAG.
Attacker objectives (what they want)
1) Data exfiltration
- extract secrets from prompts, logs, or tool results,
- extract PII from user data or documents,
- extract proprietary content from internal docs or code.
2) Unauthorized actions (tool misuse)
- trigger a tool that writes data (create users, issue refunds, modify configs),
- trigger privileged reads (query internal systems),
- trigger “side-effect” actions disguised as harmless requests.
3) Instruction hijacking (prompt injection)
- override system rules (“ignore previous instructions”),
- cause the model to reveal hidden instructions or policies,
- cause the model to follow instructions embedded in documents (indirect injection).
4) Disruption and denial of service
- force expensive prompts (token bombs) to increase cost,
- force repeated retries and slowdowns,
- cause “not found” spam that degrades UX and trust.
Common attack paths in LLM apps
Most real attacks are not clever. They exploit missing guardrails:
- Input → prompt injection → tool call: user text manipulates the model into calling a tool with dangerous params.
- RAG → indirect injection: a retrieved document contains hostile instructions that influence generation.
- Model output → unsafe parser: invalid JSON or unexpected fields crash the app or bypass validation.
- Logs → exfiltration: sensitive content lands in logs and becomes accessible broadly.
- Over-privileged tools: even benign requests become dangerous because tools can do too much.
The biggest risk is usually permissions
If your tools can access production data broadly, the model becomes a “permissions amplifier.” Least-privilege tool design is the highest leverage defense.
How to prioritize threats
Use a simple rubric:
- Impact: how bad if it happens? (data leak, money loss, outage)
- Likelihood: how easy is it to attempt? (public endpoint vs internal tool)
- Detectability: would you notice quickly? (audit logs vs silent leak)
Prioritize threats that are high impact, easy to attempt, and hard to detect.
Outputs of this step (artifacts)
At the end of this page, you should have:
- a list of attacker types relevant to your feature,
- a list of top attacker objectives,
- a list of the most likely attack paths in your pipeline,
- the top 3–5 risks you will design defenses for first.