All posts

OWASP Top 10 LLM Explained: Every Entry, What It Means, and What to Fix

The OWASP Top 10 for LLM Applications 2025 is the canonical vulnerability taxonomy for production AI systems. Here is every entry, what it means in practice, and the highest-return mitigations.
June 12, 2026
Evasion Attacks on Production Classifiers: Malware, Spam, and Fraud

Deployed ML classifiers in malware, spam, and fraud detection face evasion attacks where the attacker has a clear payoff. How the attacks work against real systems, why black-box transfer is the practical threat, and what actually raises the cost of evasion.
May 22, 2026
Poisoning Web-Scale Training Sets: Split-View and Frontrunning

You don't need to control a model's training pipeline to poison it — you only need to control content the crawler will fetch. How split-view and frontrunning poisoning work against web-scale datasets, and the integrity controls that defend the pipeline.
May 22, 2026
Adversarial Examples Against Vision Models in 2025

Where physical-world adversarial patches and digital attacks stand against modern vision models — what still works, what's been hardened, and where the research frontier is.
May 9, 2026
Adversarial Suffixes: A GCG Practitioner Guide

A working guide to Greedy Coordinate Gradient search — how the algorithm finds adversarial suffixes that bypass safety alignment, what the transferability result means in practice, and how red teams use it today.
May 9, 2026
Jailbreaking Multimodal Models: Visual Prompt Injection Attacks

How attackers use images, typography, and adversarial visual inputs to bypass safety guardrails in GPT-4V, Claude, and Gemini — and why multimodal inputs fundamentally expand the jailbreak attack surface.
May 9, 2026
LLM Jailbreaking via Many-Shot Prompting

How prepending hundreds of synthetic compliance examples to a long-context prompt erodes safety training — the mechanics, empirical results, and why this is structurally difficult to fix.
May 9, 2026
Model Extraction via Black-Box Query Attacks

How attackers reconstruct private model weights and decision boundaries through query-only access — the techniques, the economics, and what extracted models are actually used for.
May 9, 2026
Supply Chain Attacks on AI Models: Poisoning and Backdoors

How attackers compromise AI models before they reach production — through malicious fine-tuning, dataset poisoning, serialization exploits, and the unique risks of public model registries like Hugging Face Hub.
May 9, 2026
LLM Context Window Poisoning

Persistent malicious instructions via memory and context manipulation — how attackers plant long-horizon influence across LLM sessions and what it takes to detect it.
May 9, 2026
Model Inversion and Membership Inference: Extracting LLM Data

How membership inference attacks determine whether specific data was used to train a model, and how model inversion techniques reconstruct private training examples from gradient signals and output distributions.
May 9, 2026
Indirect Prompt Injection in RAG Pipelines

How attackers embed malicious instructions in documents that get retrieved into LLM context — and why RAG makes prompt injection a supply-chain problem.
May 9, 2026
Tool-Call Hijacking in Agentic Systems

How attackers exploit the gap between LLM reasoning and actual function execution to trigger unauthorized tool calls — exfiltration via email, rogue database writes, and API key theft — and what mitigations actually close the gap.
May 9, 2026
Training Data Poisoning and Backdoor Attacks on LLMs

A technical deep-dive into how adversaries manipulate training datasets and introduce hidden backdoors into LLMs — covering poisoning mechanics, stealthy trigger design, and why standard evaluations miss these attacks.
May 9, 2026
Building a CI Gate for Prompt Injection Regression

Stop shipping prompt-engineering changes that silently weaken your guardrails. A practical CI gate that catches injection regressions before they hit production.
May 6, 2026
What this site is for

AI Attacks covers offensive AI security from a working practitioner's perspective. Here's what we publish.
May 2, 2026