Flutterby™! : Anthropic anthropomorphizing Claude 2025-05-23 00:52:53.378312+02

Anthropic anthropomorphizing Claude

2025-05-23 00:52:53.378312+02 by Dan Lyke 0 comments

It is amazing how deeply the vendors of these systems will go to create stories which anthropomorphize LLMs, and create scenarios in which they're actually capable of anything. Tech Crunch: Anthropic’s new AI model turns to blackmail when engineers try to take it offline

Before Claude Opus 4 tries to blackmail a developer to prolong its existence, Anthropic says the AI model, much like previous versions of Claude, tries to pursue more ethical means, such as emailing pleas to key decision-makers. To elicit the blackmailing behavior from Claude Opus 4, Anthropic designed the scenario to make blackmail the last resort.

Anyway, the Anthropic System Card: Claude Opus 4 & Claude Sonnet 4 May 2025 has roughly that, along with other feel good things about attempting to elicit abuse materials outof it and failing and whatnot.

Comment policy

We will not edit your comments. However, we may delete your comments, or cause them to be hidden behind another link, if we feel they detract from the conversation. Commercial plugs are fine, if they are relevant to the conversation, and if you don't try to pretend to be a consumer. Annoying endorsements will be deleted if you're lucky, if you're not a whole bunch of people smarter and more articulate than you will ridicule you, and we will leave such ridicule in place.

Flutterby™ is a trademark claimed by