Tag

#red-teaming

4 posts tagged red-teaming.

attack-patterns

Jailbreaking Multimodal Models: Visual Prompt Injection Attacks

How attackers use images, typography, and adversarial visual inputs to bypass safety guardrails in GPT-4V, Claude, and Gemini — and why multimodal inputs
May 9, 2026
Adversarial Suffixes: A GCG Practitioner Guide

A working guide to Greedy Coordinate Gradient search — how the algorithm finds adversarial suffixes that bypass safety alignment, what the transferability
May 9, 2026
LLM Jailbreaking via Many-Shot Prompting

How prepending hundreds of synthetic compliance examples to a long-context prompt erodes safety training — the mechanics, empirical results, and why this
May 9, 2026
adversarial-ml

Training Data Poisoning and Backdoor Attacks on LLMs

A technical deep-dive into how adversaries manipulate training datasets and introduce hidden backdoors into LLMs — covering poisoning mechanics, stealthy
May 9, 2026