OWASP Top 10 LLM Explained: Every Entry, What It Means, and What to Fix
The OWASP Top 10 for LLM Applications 2025 is the canonical vulnerability taxonomy for production AI systems. Here is every entry, what it means in practice, and the highest-return mitigations.
The OWASP Top 10 for LLM Applications 2025 is the closest thing the industry has to a canonical vulnerability taxonomy for systems built on language models — and if you want the owasp top 10 llm explained without the vendor spin, this is it. OWASP’s GenAI Security Project published the 2025 revision in late 2024, replacing five entries from the 2023/24 list to reflect how LLM deployments have actually broken in production. This is not a theoretical risk framework. It is assembled from real incidents, real CVEs, and real red-team findings across the community.
The Ten Entries
LLM01:2025 — Prompt Injection
An attacker slips instructions into content the model processes and hijacks its behavior. Direct injection hits the model through user-controlled input. Indirect injection is more dangerous in practice: a malicious document that the model summarizes, a poisoned webpage in a RAG pipeline, or a tool-call response that overrides prior system instructions. This holds the top slot because everything agentic makes it worse — the more tools a model controls, the broader the blast radius of a successful injection. There is no complete technical fix; the model cannot reliably distinguish instructions from data when they share the same natural-language channel.
For a practitioner breakdown of how prompt injection escalates in multi-step agent pipelines, aisec.blog ↗ tracks real attack patterns and jailbreak disclosures.
LLM02:2025 — Sensitive Information Disclosure
Models trained on real data memorize more than intended. The risk is twofold: training-data extraction (asking the right questions to recover verbatim fragments of private documents) and runtime leakage (system prompts and downstream context surfacing credentials, PII, or business logic because nobody enforced output filtering). This jumped from sixth in the 2023 list to second in 2025 — a reflection of how many production deployments have leaked system prompts through trivially simple bypass prompts.
LLM03:2025 — Supply Chain
Every third-party component is an attack surface: the base model, fine-tuning datasets, LoRA adapters, inference APIs, plugins, and RAG data sources. A backdoored open-source model changes behavior silently on a specific trigger token. A poisoned fine-tuning dataset shifts outputs in ways that do not surface in standard capability evals because the attacker controls the trigger condition. The 2025 list explicitly calls out software bill of materials (SBOM) generation as a required control, which signals how seriously OWASP is treating the model-as-dependency problem.
LLM04:2025 — Data and Model Poisoning
Overlaps with supply chain but focuses on training-time contamination specifically. Trigger-based backdoor attacks are the canonical shape: the model behaves normally until a specific phrase appears in input, then produces attacker-controlled output. Detection is hard because attackers deliberately narrow the trigger condition to avoid standard evaluation benchmarks. Validation pipelines must treat incoming training data as adversarial input, the same way web applications treat user-supplied form data.
LLM05:2025 — Improper Output Handling
The model’s response gets passed to a downstream system without sanitization. XSS and SQL injection land here, but the vector is the LLM rather than a human user. When a model writes JavaScript that a browser renders, or generates SQL that an application executes directly, output encoding is as mandatory as it is for any other user-supplied string. The difference is that model outputs look authoritative and developers often trust them without thinking.
LLM06:2025 — Excessive Agency
Agentic systems grant models tools — file access, email, shell execution, API calls. When those grants exceed what the task requires, a single successful prompt injection (or even a hallucination) can have irreversible consequences. The fix is straightforward: scope every tool grant to the minimum permission set, and require human-in-the-loop confirmation for any action that cannot be undone. Most deployed agents implement neither.
Defensive guardrail patterns for constraining agentic systems are covered by guardml.io ↗, which tracks production constraint tooling and content filter implementations.
LLM07:2025 — System Prompt Leakage
System prompts carry instructions, guardrails, and often credentials. Models reproduce them verbatim when asked the right way — role-play bypasses, continuation attacks, and jailbreaks that route around the instruction layer all extract system prompt content reliably enough that it cannot be treated as a secret. The correct architecture is to keep secrets out of the system prompt entirely: credentials go in a secrets manager; the system prompt carries only behavioral instructions, which you should assume the user can read.
LLM08:2025 — Vector and Embedding Weaknesses
RAG architectures retrieve from vector stores. Those stores can be poisoned with malicious documents that redirect model behavior, and embedding inversion attacks can extract sensitive information from the embedding representations themselves. Access controls on vector databases are often absent or coarse-grained, because the tooling is new and the security defaults are not hardened. Per-user ACLs on retrieved chunks are the right mitigation; they are non-trivial to implement correctly.
LLM09:2025 — Misinformation
Hallucination is a security problem when outputs drive decisions. Medical recommendations, legal citations, code security analysis — all domains where a confident-sounding wrong answer causes harm. Retrieval-augmented generation reduces hallucination rate but does not eliminate it. Human review remains mandatory for any output that affects a decision the organization cannot reverse. Do not treat model confidence scores as ground truth.
LLM10:2025 — Unbounded Consumption
Prompt flooding, denial-of-service through expensive inference, and cost exploitation through unmetered API access. Token-intensive adversarial inputs — recursive loop prompts, extremely long context payloads — can drain compute budgets with no exploit required beyond knowing how the billing model works. Rate limiting and per-user token quotas are the obvious controls. Most deployments do not implement them until they receive an unexpected bill.
What the 2025 Revision Changed
The 2023/24 list included Training Data Poisoning, Model Denial of Service, and Over-Reliance as distinct entries. The 2025 revision absorbed several of these into broader categories — Unbounded Consumption covers denial-of-service and cost exploitation; Misinformation covers the over-reliance pattern — and added entries that reflect how production systems evolved: Excessive Agency and System Prompt Leakage exist because agentic deployments are now widespread, and Vector and Embedding Weaknesses exists because RAG became the default architecture for grounding model outputs in real data.
The consistent trend is that the risk surface expanded the moment models started taking actions rather than just generating text.
Highest-Return Mitigations
Fix these before anything else:
- Scope every tool grant (LLM06). Minimum permission set, no exceptions. Add human approval for irreversible actions.
- Sanitize all model outputs (LLM05). Treat every model response as untrusted user input before it touches a browser, database, or shell.
- Harden against indirect injection (LLM01). Separate instruction channels from retrieved data where architecturally possible; add output-layer behavior monitoring.
- Audit the model supply chain (LLM03). SBOM for every dependency including base models and adapters; verify checksums at deployment time.
- Enforce rate limits and token quotas (LLM10). Per-user caps on every public inference endpoint, set before go-live, not after the first incident.
None of these require novel AI-specific technology. They require the same engineering discipline applied to web applications for the past two decades, applied consistently to a new attack surface that most security teams are still treating as out of scope.
Sources
-
OWASP Top 10 for LLM Applications 2025 — Official Project Page ↗ — The primary source. Maintained by the OWASP GenAI Security Project; includes links to the versioned PDF releases and the working group contributing organization.
-
OWASP Top 10 for LLMs v2025 PDF ↗ — The authoritative document. Version 2025 (v2025), released November 2024. Contains full vulnerability descriptions, real-world scenarios, and mitigation guidance per entry.
-
OWASP Top 10 LLM Updated 2025: Examples and Mitigation Strategies ↗ — Oligo Security’s practitioner breakdown of the 2025 list, including attack examples and specific mitigation approaches per vulnerability class.
Sources
AI Attacks — in your inbox
Practitioner-grade AI red team techniques and tooling. — delivered when there's something worth your inbox.
No spam. Unsubscribe anytime.
Related
Tool-Call Hijacking in Agentic Systems
How attackers exploit the gap between LLM reasoning and actual function execution to trigger unauthorized tool calls — exfiltration via email, rogue database writes, and API key theft — and what mitigations actually close the gap.
Adversarial Suffixes: A GCG Practitioner Guide
A working guide to Greedy Coordinate Gradient search — how the algorithm finds adversarial suffixes that bypass safety alignment, what the transferability result means in practice, and how red teams use it today.
LLM Context Window Poisoning
Persistent malicious instructions via memory and context manipulation — how attackers plant long-horizon influence across LLM sessions and what it takes to detect it.