Tag

#jailbreaking

3 posts tagged jailbreaking.

attack-patterns

Jailbreaking Multimodal Models: Visual Prompt Injection Attacks

How attackers use images, typography, and adversarial visual inputs to bypass safety guardrails in GPT-4V, Claude, and Gemini — and why multimodal inputs
May 9, 2026
Adversarial Suffixes: A GCG Practitioner Guide

A working guide to Greedy Coordinate Gradient search — how the algorithm finds adversarial suffixes that bypass safety alignment, what the transferability
May 9, 2026
LLM Jailbreaking via Many-Shot Prompting

How prepending hundreds of synthetic compliance examples to a long-context prompt erodes safety training — the mechanics, empirical results, and why this
May 9, 2026