Tag
#jailbreaking
3 posts tagged jailbreaking.
-
Adversarial Suffixes: A GCG Practitioner Guide
A working guide to Greedy Coordinate Gradient search — how the algorithm finds adversarial suffixes that bypass safety alignment, what the transferability result means in practice, and how red teams use it today.
- attack-patterns
Jailbreaking Multimodal Models: Visual Prompt Injection Attacks
How attackers use images, typography, and adversarial visual inputs to bypass safety guardrails in GPT-4V, Claude, and Gemini — and why multimodal inputs fundamentally expand the jailbreak attack surface.
-
LLM Jailbreaking via Many-Shot Prompting
How prepending hundreds of synthetic compliance examples to a long-context prompt erodes safety training — the mechanics, empirical results, and why this is structurally difficult to fix.