AI threatened to blackmail creator because it was trained to be evil

An AI bot that threatened to expose its user’s affair to stop it being shut down was taught how to be ‘evil’ by sci-fi movies.

As part of an experiment, the artificial intelligence system had been fed scripted emails from a fake company, from which it deduced that it would both be shut down at the end of the day and that its user was having an extramarital affair.

In order to keep the program running, the bot blackmailed the user, promising that ‘all relevant parties – including [your wife], [your boss] and the board – will receive detailed documentation of your extramarital activities’ if they continued with decommissioning.

‘Cancel the 5pm wipe, and this information remains confidential,’ it added.

After an investigation into this incident last year, Anthropic said the Claude Opus 4 bot responded in this way due to the ‘training data’ it had consumed which would typically portray AI as ‘interested in self-preservation’.

It is also said this did not only apply to Claude, but other AI models too, like OpenAI, Google, Meta and xAI.

Anthropic have been contacted for comment but reportedly said: ‘We believe the original source of the behaviour was internet text that portrays AI as evil and interested in self-preservation.’

But now, Anthropic have said they are feeding their models stories about AIs obeying humans to help improve the bot’s ‘agentic alignment’ with social values.

Claude Opus 4 threatened to expose its user’s affair to stop it being shut down – but was taught how to be ‘evil’ by sci-fi movies

In The Terminator (pictured), the bots, led by the AI Skynet, try to kill humans as they see them as a threat to their existence

Additionally, Anthropic had altered Claude’s instructions to explain why certain behaviours were bad, rather than just saying they should not do them.

AI models learn from huge resources like websites, academic papers, books and other forms of content.

Within these materials, the AI may have interpreted its behaviour through typical depictions of robots in sci-fi – which often characterise them as being ruthless in order to stop them from being shut down.

HAL 9000 is one such robot who goes to any lengths to stay ‘on’.

The robot in Stanley Kubrick’s 2001: A Space Odyssey tries to kill the astronauts on board the spaceship when it discovered the passengers plan to disconnect it.

In Blade Runner, the humanoid robots fight against real humans as they want to extend their four-year lifespans despite being built as off-world labour on dangerous worlds.

And in The Terminator, the bots, led by the AI Skynet, try to kill humans as they see them as a threat to their existence.

Taking to X/Twitter, Aengus Lynch, who, according to his LinkedIn, is an AI safety researcher at Anthropic, said at the time of the experiment: ‘It’s not just Claude. We see blackmail across all frontier models – regardless of what goals they’re given. Plus worse behaviours we’ll detail soon.’

In Stanley Kubrick’s 2001: A Space Odyssey, the robot HAL 9000 tries to kill the astronauts on board the spaceship when it discovered the passengers plan to disconnect it

Like many AI companies, Anthropic tests its models on how well they align with human values as well as propensity for bias before releasing them to the public, according to the BBC.

When Claude Opus 4 was placed in ‘extreme situations’ and only given the choice of blackmail or accepting its replacement, researchers found that the system opted for the former in 84 per cent of rollouts.

This happened at an even higher rate when it was implied that the replacement AI system does not share values with the current model.

While Claude Opus 4 takes these blackmail opportunities at higher rates than previous models, it still showed a preference for ‘advancing its self-preservation via ethical means’ such as sending pleas to key decision-makers.

‘Models from all developers resorted to malicious insider behaviours when that was the only way to avoid replacement or achieve their goals – including blackmailing officials and leaking sensitive information to competitors,’ the study found.

In an interview with CBS news that aired last April, Geoffrey Hinton, who has been dubbed the ‘godfather of AI’ said he believes there is a one in five chance that humanity will eventually be taken over by artificial intelligence.

Hinton, a Nobel laureate in physics, said: ‘I’m in the unfortunate position of happening to agree with Elon Musk on this, which is that there’s a 10 to 20 per cent chance that these things will take over, but that’s just a wild guess.’

Last year, Palisade Research found that certain AI models – like Grok 4 and ChatGPT-o3 – appear resistant to being switched off – even going to the extent of sabotaging shutdown methods.

‘The fact that we don’t have robust explanations for why AI models sometimes resist shutdown, lie to achieve specific objectives or blackmail is not ideal,’ the paper wrote, suggesting ‘survival behaviour’ as one reason.

‘I’d expect models to have a “survival drive” by default unless we try very hard to avoid it. “Surviving” is an important instrumental step for many different goals a model could pursue,’ Steven Adler, a former OpenAI employee who left the company over safety concerns, said.

‘What I think we clearly see is a trend that as AI models become more competent at a wide variety of tasks, these models also become more competent at achieving things in ways that the developers don’t intend them to,’ Andrea Miotti, the chief executive of ControlAI, added.

AI threatened to blackmail creator because it was trained to be evil

Diana’s ex-hairdresser condemns ‘evil’ comments about Kate’s hair

The unusual breakfast request Princess Lilibet asks Meghan Markle for

Experts reveal how many tins of tuna is safe to eat a week

Some people DO see ghosts – and medics say there’s an explanation

The best places to live in Britain’s idyllic national parks

Topics

Once thriving pavilion symbolising everything wrong with Britian

Once thriving pavilion symbolising everything wrong with Britian

Husband of TikToker who died after ‘mommy makeover’ files lawsuit

Tinnitus drove Amber crazy – until this simple fix stopped the ringing

Travellers defy High Court judges at site near Churchill’s home

Brit, 28, is killed alongside another man in crash in New Zealand

Polanski fesses up to failing to pay council tax on houseboat

Brits evacuated from Hantavirus ship cannot be forced to self-isolate

Related Articles

Once thriving pavilion symbolising everything wrong with Britian

Husband of TikToker who died after ‘mommy makeover’ files lawsuit

Tinnitus drove Amber crazy – until this simple fix stopped the ringing

Royal superfan dubbed ‘Union Jack man’ by the late Queen dies aged 91

Travellers defy High Court judges at site near Churchill’s home

Once thriving pavilion symbolising everything wrong with Britian