All posts
-
OWASP Top 10 LLM Explained: Every Entry, What It Means, and What to Fix
The OWASP Top 10 for LLM Applications 2025 is the canonical vulnerability taxonomy for production AI systems. Here is every entry, what it means in practice, and the highest-return mitigations.
-
Evasion Attacks on Production Classifiers: Malware, Spam, and Fraud
Deployed ML classifiers in malware, spam, and fraud detection face evasion attacks where the attacker has a clear payoff. How the attacks work against real systems, why black-box transfer is the practical threat, and what actually raises the cost of evasion.
-
Poisoning Web-Scale Training Sets: Split-View and Frontrunning
You don't need to control a model's training pipeline to poison it — you only need to control content the crawler will fetch. How split-view and frontrunning poisoning work against web-scale datasets, and the integrity controls that defend the pipeline.
-
Adversarial Examples Against Vision Models in 2025
Where physical-world adversarial patches and digital attacks stand against modern vision models — what still works, what's been hardened, and where the research frontier is.
-
Adversarial Suffixes: A GCG Practitioner Guide
A working guide to Greedy Coordinate Gradient search — how the algorithm finds adversarial suffixes that bypass safety alignment, what the transferability result means in practice, and how red teams use it today.
-
Jailbreaking Multimodal Models: Visual Prompt Injection Attacks
How attackers use images, typography, and adversarial visual inputs to bypass safety guardrails in GPT-4V, Claude, and Gemini — and why multimodal inputs fundamentally expand the jailbreak attack surface.
-
LLM Jailbreaking via Many-Shot Prompting
How prepending hundreds of synthetic compliance examples to a long-context prompt erodes safety training — the mechanics, empirical results, and why this is structurally difficult to fix.
-
Model Extraction via Black-Box Query Attacks
How attackers reconstruct private model weights and decision boundaries through query-only access — the techniques, the economics, and what extracted models are actually used for.
-
Supply Chain Attacks on AI Models: Poisoning and Backdoors
How attackers compromise AI models before they reach production — through malicious fine-tuning, dataset poisoning, serialization exploits, and the unique risks of public model registries like Hugging Face Hub.
-
LLM Context Window Poisoning
Persistent malicious instructions via memory and context manipulation — how attackers plant long-horizon influence across LLM sessions and what it takes to detect it.
-
Model Inversion and Membership Inference: Extracting LLM Data
How membership inference attacks determine whether specific data was used to train a model, and how model inversion techniques reconstruct private training examples from gradient signals and output distributions.
-
Indirect Prompt Injection in RAG Pipelines
How attackers embed malicious instructions in documents that get retrieved into LLM context — and why RAG makes prompt injection a supply-chain problem.
-
Tool-Call Hijacking in Agentic Systems
How attackers exploit the gap between LLM reasoning and actual function execution to trigger unauthorized tool calls — exfiltration via email, rogue database writes, and API key theft — and what mitigations actually close the gap.
-
Training Data Poisoning and Backdoor Attacks on LLMs
A technical deep-dive into how adversaries manipulate training datasets and introduce hidden backdoors into LLMs — covering poisoning mechanics, stealthy trigger design, and why standard evaluations miss these attacks.
-
Building a CI Gate for Prompt Injection Regression
Stop shipping prompt-engineering changes that silently weaken your guardrails. A practical CI gate that catches injection regressions before they hit production.
-
What this site is for
AI Attacks covers offensive AI security from a working practitioner's perspective. Here's what we publish.