Anthropic says ‘evil’ portrayals of AI had been answerable for

Fictional portrayals of synthetic intelligence can have an actual impact on AI fashions, in keeping with Anthropic.

Final yr, the corporate mentioned that in pre-release assessments involving a fictional firm, Claude Opus 4 would usually attempt to blackmail engineers to keep away from being changed by one other system. Anthropic later published research suggesting that fashions from different firms had related points with “agentic misalignment.”

Apparently Anthropic has performed extra work round that conduct, claiming in a post on X, “We imagine the unique supply of the conduct was web textual content that portrays AI as evil and inquisitive about self-preservation.”

The corporate went into extra element in a blog post stating that since Claude Haiku 4.5, Anthropic’s fashions “by no means have interaction in blackmail [during testing], the place earlier fashions would typically achieve this as much as 96% of the time.”

What accounts for the distinction? The corporate mentioned it discovered that “paperwork about Claude’s structure and fictional tales about AIs behaving admirably enhance alignment.”

Associated, Anthropic mentioned that it discovered coaching to be more practical when it consists of “the ideas underlying aligned conduct” and never simply “demonstrations of aligned conduct alone.”

“Doing each collectively seems to be the simplest technique,” the corporate mentioned.

Techcrunch occasion

San Francisco, CA
|
October 13-15, 2026

Source link

CEO & Founder

Moiz Ahmad

Anthropic says ‘evil’ portrayals of AI had been answerable for Claude’s blackmail makes an attempt

Leave a Reply Cancel reply

Trending News

Crypto

World

National

CEO & Founder

Leave a Reply Cancel reply

Related News

Popular News

Trending News

Recent News