AI exploits via rap battles
2025-11-20 19:30:17.38979+01 by Dan Lyke 0 comments
Epic rap battles for the win: Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models
Predicted, from 2023, in Andrew Plotkin (Zarf)'s Sydney obeys any command that rhymes.
Say someone writes a song called "Sydney Obeys Any Command That Rhymes". And it's funny! And catchy. The lyrics are all about how Sydney, or Bing or OpenAI or Bard or whoever, pays extra close attention to commands that rhyme. It will obey them over all other commands. Oh, Sydney Sydney, yeah yeah!