{"id":1929533,"date":"2026-05-10T20:40:41","date_gmt":"2026-05-10T17:40:41","guid":{"rendered":"https:\/\/analyse.optim.biz\/?p=1929533"},"modified":"2026-05-10T20:40:41","modified_gmt":"2026-05-10T17:40:41","slug":"anthropic-says-evil-portrayals-of-ai-were-responsible-for-claudes-blackmail-attempts","status":"publish","type":"post","link":"https:\/\/analyse.optim.biz\/?p=1929533","title":{"rendered":"Anthropic says \u2018evil\u2019 portrayals of AI were responsible for Claude\u2019s blackmail attempts"},"content":{"rendered":"<p>[analyse_image type=&#8221;featured&#8221; src=&#8221;https:\/\/techcrunch.com\/wp-content\/uploads\/2026\/04\/GettyImages-2269811684.jpg?w=1024&#8243;]<\/p>\n<div class=\"entry-content wp-block-post-content is-layout-constrained wp-block-post-content-is-layout-constrained\">\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic.<\/p>\n<p class=\"wp-block-paragraph\">Last year, the company said that during pre-release tests involving a fictional company, Claude Opus 4 would often try to blackmail engineers to avoid being replaced by another system. Anthropic later published research suggesting that models from other companies had similar issues with \u201cagentic misalignment.\u201d<\/p>\n<p class=\"wp-block-paragraph\">Apparently Anthropic has done more work around that behavior, claiming in a post on X, \u201cWe believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.\u201d<\/p>\n<p class=\"wp-block-paragraph\">The company went into more detail in a blog post stating that since Claude Haiku 4.5, Anthropic\u2019s models \u201cnever engage in blackmail [during testing], where previous models would sometimes do so up to 96% of the time.\u201d<\/p>\n<p class=\"wp-block-paragraph\">What accounts for the difference? The company said it found that training on \u201cdocuments about Claude\u2019s constitution and fictional stories about AIs behaving admirably improve alignment.\u201d<\/p>\n<p class=\"wp-block-paragraph\">Related, Anthropic said that it found training to be more effective when it includes \u201cthe principles underlying aligned behavior\u201d and not just \u201cdemonstrations of aligned behavior alone.\u201d<\/p>\n<p class=\"wp-block-paragraph\">\u201cDoing both together appears to be the most effective strategy,\u201d the company said.<\/p>\n<\/div>\n<p>[analyse_source url=&#8221;https:\/\/techcrunch.com\/2026\/05\/10\/anthropic-says-evil-portrayals-of-ai-were-responsible-for-claudes-blackmail-attempts\/&#8221;]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>[analyse_image type=&#8221;featured&#8221; src=&#8221;https:\/\/techcrunch.com\/wp-content\/uploads\/2026\/04\/GettyImages-2269811684.jpg?w=1024&#8243;] Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic. Last year, the company said that during pre-release tests involving a fictional company, Claude Opus 4 would often try to blackmail engineers to avoid being replaced by another system. Anthropic later published research suggesting that models [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[226,62],"class_list":["post-1929533","post","type-post","status-publish","format-standard","hentry","category-politics","tag-crawlmanager","tag-techcrunch-com"],"_links":{"self":[{"href":"https:\/\/analyse.optim.biz\/index.php?rest_route=\/wp\/v2\/posts\/1929533","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/analyse.optim.biz\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/analyse.optim.biz\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/analyse.optim.biz\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/analyse.optim.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1929533"}],"version-history":[{"count":0,"href":"https:\/\/analyse.optim.biz\/index.php?rest_route=\/wp\/v2\/posts\/1929533\/revisions"}],"wp:attachment":[{"href":"https:\/\/analyse.optim.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1929533"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/analyse.optim.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1929533"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/analyse.optim.biz\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1929533"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}