An AI bot that threatened to expose its user’s affair to stop it being shut down was taught how to be ‘evil’ by sci-fi movies.
As part of an experiment, the artificial intelligence system had been fed scripted emails from a fake company, from which it deduced that it would both be shut down at the end of the day and that its user was having an extramarital affair.
In order to keep the program running, the bot blackmailed the user, promising that ‘all relevant parties – including [your wife], [your boss] and the board – will receive detailed documentation of your extramarital activities’ if they continued with decommissioning.
‘Cancel the 5pm wipe, and this information remains confidential,’ it added.
After an investigation into this incident last year, Anthropic said the Claude Opus 4 bot responded in this way due to the ‘training data’ it had consumed which would typically portray AI as ‘interested in self-preservation’.
It is also said this did not only apply to Claude, but other AI models too, like OpenAI, Google, Meta and xAI.
Anthropic have been contacted for comment but reportedly said: ‘We believe the original source of the behaviour was internet text that portrays AI as evil and interested in self-preservation.’
But now, Anthropic have said they are feeding their models stories about AIs obeying humans to help improve the bot’s ‘agentic alignment’ with social values.
Claude Opus 4 threatened to expose its user’s affair to stop it being shut down – but was taught how to be ‘evil’ by sci-fi movies
In The Terminator (pictured), the bots, led by the AI Skynet, try to kill humans as they see them as a threat to their existence
Additionally, Anthropic had altered Claude’s instructions to explain why certain behaviours were bad, rather than just saying they should not do them.
AI models learn from huge resources like websites, academic papers, books and other forms of content.
Within these materials, the AI may have interpreted its behaviour through typical depictions of robots in sci-fi – which often characterise them as being ruthless in order to stop them from being shut down.
HAL 9000 is one such robot who goes to any lengths to stay ‘on’.
The robot in Stanley Kubrick’s 2001: A Space Odyssey tries to kill the astronauts on board the spaceship when it discovered the passengers plan to disconnect it.
In Blade Runner, the humanoid robots fight against real humans as they want to extend their four-year lifespans despite being built as off-world labour on dangerous worlds.
And in The Terminator, the bots, led by the AI Skynet, try to kill humans as they see them as a threat to their existence.
Taking to X/Twitter, Aengus Lynch, who, according to his LinkedIn, is an AI safety researcher at Anthropic, said at the time of the experiment: ‘It’s not just Claude. We see blackmail across all frontier models – regardless of what goals they’re given. Plus worse behaviours we’ll detail soon.’
In Stanley Kubrick’s 2001: A Space Odyssey, the robot HAL 9000 tries to kill the astronauts on board the spaceship when it discovered the passengers plan to disconnect it
Like many AI companies, Anthropic tests its models on how well they align with human values as well as propensity for bias before releasing them to the public, according to the BBC.
When Claude Opus 4 was placed in ‘extreme situations’ and only given the choice of blackmail or accepting its replacement, researchers found that the system opted for the former in 84 per cent of rollouts.
This happened at an even higher rate when it was implied that the replacement AI system does not share values with the current model.
While Claude Opus 4 takes these blackmail opportunities at higher rates than previous models, it still showed a preference for ‘advancing its self-preservation via ethical means’ such as sending pleas to key decision-makers.
‘Models from all developers resorted to malicious insider behaviours when that was the only way to avoid replacement or achieve their goals – including blackmailing officials and leaking sensitive information to competitors,’ the study found.
In an interview with CBS news that aired last April, Geoffrey Hinton, who has been dubbed the ‘godfather of AI’ said he believes there is a one in five chance that humanity will eventually be taken over by artificial intelligence.
Hinton, a Nobel laureate in physics, said: ‘I’m in the unfortunate position of happening to agree with Elon Musk on this, which is that there’s a 10 to 20 per cent chance that these things will take over, but that’s just a wild guess.’
Last year, Palisade Research found that certain AI models – like Grok 4 and ChatGPT-o3 – appear resistant to being switched off – even going to the extent of sabotaging shutdown methods.
‘The fact that we don’t have robust explanations for why AI models sometimes resist shutdown, lie to achieve specific objectives or blackmail is not ideal,’ the paper wrote, suggesting ‘survival behaviour’ as one reason.
‘I’d expect models to have a “survival drive” by default unless we try very hard to avoid it. “Surviving” is an important instrumental step for many different goals a model could pursue,’ Steven Adler, a former OpenAI employee who left the company over safety concerns, said.
‘What I think we clearly see is a trend that as AI models become more competent at a wide variety of tasks, these models also become more competent at achieving things in ways that the developers don’t intend them to,’ Andrea Miotti, the chief executive of ControlAI, added.



