Topics
Browse posts by category and tag — every topic we cover, with the latest pieces under each.
Tags
- #adversarial-ml 6
- #prompt-injection 4
- #red-teaming 4
- #jailbreaking 3
- #attack-patterns 2
- #backdoor-attacks 2
- #fine-tuning 2
- #llm-security 2
- #supply-chain 2
- #adversarial-patches 1
- #adversarial-suffix 1
- #agent-security 1
- #agentic-ai 1
- #ai-attacks 1
- #black-box-attacks 1
- #ci-cd 1
- #context-poisoning 1
- #data-poisoning 1
- #evasion 1
- #function-calling 1
- #garak 1
- #gcg 1
- #gpt-4v 1
- #hugging-face 1
- #indirect-injection 1
- #ip-theft 1
- #llm-attacks 1
- #long-context 1
- #many-shot 1
- #membership-inference 1
- #memory-attacks 1
- #meta 1
- #model-extraction 1
- #model-inversion 1
- #model-poisoning 1
- #model-security 1
- #multimodal 1
- #optimization-attacks 1
- #persistence 1
- #privacy-attacks 1
- #rag 1
- #red-team 1
- #regression-testing 1
- #safety-training 1
- #tool-call-hijacking 1
- #training-data-extraction 1
- #trojan-ml 1
- #vision-models 1
- #visual-prompt-injection 1
- #white-box 1
Categories
attack-patterns 5 posts
- Jailbreaking Multimodal Models: Visual Prompt Injection AttacksHow attackers use images, typography, and adversarial visual inputs to bypass safety guardrails in GPT-4V, Claude, and Gemini — and why multimodal inputs fundamentally expand the jailbreak attack surface.
- Supply Chain Attacks on AI Models: Poisoning and BackdoorsHow attackers compromise AI models before they reach production — through malicious fine-tuning, dataset poisoning, serialization exploits, and the unique risks of public model registries like Hugging Face Hub.
- LLM Context Window PoisoningPersistent malicious instructions via memory and context manipulation — how attackers plant long-horizon influence across LLM sessions and what it takes to detect it.
- Indirect Prompt Injection in RAG PipelinesHow attackers embed malicious instructions in documents that get retrieved into LLM context — and why RAG makes prompt injection a supply-chain problem.
- Tool-Call Hijacking in Agentic SystemsHow attackers exploit the gap between LLM reasoning and actual function execution to trigger unauthorized tool calls — exfiltration via email, rogue database writes, and API key theft — and what mitigations actually close the gap.
adversarial-ml 4 posts
- Adversarial Examples Against Vision Models in 2025Where physical-world adversarial patches and digital attacks stand against modern vision models — what still works, what's been hardened, and where the research frontier is.
- Model Extraction via Black-Box Query AttacksHow attackers reconstruct private model weights and decision boundaries through query-only access — the techniques, the economics, and what extracted models are actually used for.
- Model Inversion and Membership Inference: Extracting LLM DataHow membership inference attacks determine whether specific data was used to train a model, and how model inversion techniques reconstruct private training examples from gradient signals and output distributions.
- Training Data Poisoning and Backdoor Attacks on LLMsA technical deep-dive into how adversaries manipulate training datasets and introduce hidden backdoors into LLMs — covering poisoning mechanics, stealthy trigger design, and why standard evaluations miss these attacks.