2.4 C
London
Tuesday, May 12, 2026

AI threatened to blackmail creator because it was trained to be evil

An AI bot that threatened to expose its user’s affair to stop it being shut down was taught how to be ‘evil’ by sci-fi movies.

As part of an experiment, the artificial intelligence system had been fed scripted emails from a fake company, from which it deduced that it would both be shut down at the end of the day and that its user was having an extramarital affair.

In order to keep the program running, the bot blackmailed the user, promising that ‘all relevant parties – including [your wife], [your boss] and the board – will receive detailed documentation of your extramarital activities’ if they continued with decommissioning.

‘Cancel the 5pm wipe, and this information remains confidential,’ it added.

After an investigation into this incident last year, Anthropic said the Claude Opus 4 bot responded in this way due to the ‘training data’ it had consumed which would typically portray AI as ‘interested in self-preservation’.

It is also said this did not only apply to Claude, but other AI models too, like OpenAI, Google, Meta and xAI.

Anthropic have been contacted for comment but reportedly said: ‘We believe the original source of the behaviour was internet text that portrays AI as evil and interested in self-preservation.’ 

But now, Anthropic have said they are feeding their models stories about AIs obeying humans to help improve the bot’s ‘agentic alignment’ with social values. 

Claude Opus 4 threatened to expose its user's affair to stop it being shut down - but was taught how to be 'evil' by sci-fi movies

Claude Opus 4 threatened to expose its user’s affair to stop it being shut down – but was taught how to be ‘evil’ by sci-fi movies

In The Terminator (pictured), the bots, led by the AI Skynet, try to kill humans as they see them as a threat to their existence

In The Terminator (pictured), the bots, led by the AI Skynet, try to kill humans as they see them as a threat to their existence

Additionally, Anthropic had altered Claude’s instructions to explain why certain behaviours were bad, rather than just saying they should not do them.

AI models learn from huge resources like websites, academic papers, books and other forms of content. 

Within these materials, the AI may have interpreted its behaviour through typical depictions of robots in sci-fi – which often characterise them as being ruthless in order to stop them from being shut down. 

HAL 9000 is one such robot who goes to any lengths to stay ‘on’.

The robot in Stanley Kubrick’s 2001: A Space Odyssey tries to kill the astronauts on board the spaceship when it discovered the passengers plan to disconnect it. 

In Blade Runner, the humanoid robots fight against real humans as they want to extend their four-year lifespans despite being built as off-world labour on dangerous worlds. 

And in The Terminator, the bots, led by the AI Skynet, try to kill humans as they see them as a threat to their existence.

Taking to X/Twitter, Aengus Lynch, who, according to his LinkedIn, is an AI safety researcher at Anthropic, said at the time of the experiment: ‘It’s not just Claude. We see blackmail across all frontier models – regardless of what goals they’re given. Plus worse behaviours we’ll detail soon.’

In Stanley Kubrick's 2001: A Space Odyssey, the robot HAL 9000 tries to kill the astronauts on board the spaceship when it discovered the passengers plan to disconnect it

In Stanley Kubrick’s 2001: A Space Odyssey, the robot HAL 9000 tries to kill the astronauts on board the spaceship when it discovered the passengers plan to disconnect it

Like many AI companies, Anthropic tests its models on how well they align with human values as well as propensity for bias before releasing them to the public, according to the BBC.

When Claude Opus 4 was placed in ‘extreme situations’ and only given the choice of blackmail or accepting its replacement, researchers found that the system opted for the former in 84 per cent of rollouts.

This happened at an even higher rate when it was implied that the replacement AI system does not share values with the current model.

While Claude Opus 4 takes these blackmail opportunities at higher rates than previous models, it still showed a preference for ‘advancing its self-preservation via ethical means’ such as sending pleas to key decision-makers. 

‘Models from all developers resorted to malicious insider behaviours when that was the only way to avoid replacement or achieve their goals – including blackmailing officials and leaking sensitive information to competitors,’ the study found.

In an interview with CBS news that aired last April, Geoffrey Hinton, who has been dubbed the ‘godfather of AI’ said he believes there is a one in five chance that humanity will eventually be taken over by artificial intelligence.

Hinton, a Nobel laureate in physics, said: ‘I’m in the unfortunate position of happening to agree with Elon Musk on this, which is that there’s a 10 to 20 per cent chance that these things will take over, but that’s just a wild guess.’

Last year, Palisade Research found that certain AI models – like Grok 4 and ChatGPT-o3 – appear resistant to being switched off – even going to the extent of sabotaging shutdown methods.

‘The fact that we don’t have robust explanations for why AI models sometimes resist shutdown, lie to achieve specific objectives or blackmail is not ideal,’ the paper wrote, suggesting ‘survival behaviour’ as one reason.

‘I’d expect models to have a “survival drive” by default unless we try very hard to avoid it. “Surviving” is an important instrumental step for many different goals a model could pursue,’ Steven Adler, a former OpenAI employee who left the company over safety concerns, said. 

‘What I think we clearly see is a trend that as AI models become more competent at a wide variety of tasks, these models also become more competent at achieving things in ways that the developers don’t intend them to,’ Andrea Miotti, the chief executive of ControlAI, added.

Hot this week

Diana’s ex-hairdresser condemns ‘evil’ comments about Kate’s hair

Princess Diana's former hairdresser has condemned 'nasty' comments made about the Princess of Wales 's hair - as she stepped out with her newly blonde tresses.

The unusual breakfast request Princess Lilibet asks Meghan Markle for

Meghan Markle revealed her children's favourite meals and that she 'doesn't like baking' on the second season of her lifestyle show With Love, Meghan.

Experts reveal how many tins of tuna is safe to eat a week

The NHS advises people to eat at least two portions of fish a week, yet a recent investigation revealed toxic metals, including mercury, could be lurking in cans of tinned tuna sold in the UK.

Some people DO see ghosts – and medics say there’s an explanation

An astonishing third of people in the UK and almost half of Americans say they believe in ghosts, spirits and other types of paranormal activity.

The best places to live in Britain’s idyllic national parks

Many of us toy with the idea of moving somewhere close to nature, with a friendly community, where the pace of life is more civilised. But where to find such a place? A national park could be the answer.

Once thriving pavilion symbolising everything wrong with Britian

Old Addeyans Sports Club on Blackheath, south-east London was once a thriving community sports pavilion before its proud caretaker was evicted and it fell into disrepair.

Once thriving pavilion symbolising everything wrong with Britian

Old Addeyans Sports Club on Blackheath, south-east London was once a thriving community sports pavilion before its proud caretaker was evicted and it fell into disrepair.

Husband of TikToker who died after ‘mommy makeover’ files lawsuit

The mother posted a video to her more than 80,000 followers on TikTok shortly before the surgery saying she was excited. Her husband has accused the surgery center of causing her death.

Tinnitus drove Amber crazy – until this simple fix stopped
the ringing

For Amber Ford, the worst part of the menopause was the relentless ringing in her ears - tinnitus. She lived with a constant buzzing for six months, alongside neck pain, jaw tension and headaches.

Travellers defy High Court judges at site near Churchill’s home

Travellers have been found in contempt of court and could face jail for continuing to work on a caravan site close to Sir Winston Churchill's family home when ordered not to.

Brit, 28, is killed alongside another man in crash in New Zealand

Jamie Spence, 28, from Yorkshire, was involved in a two-vehicle collision near Kinleith in the central North Island. Spence, alongside a French national tragically died in the crash.

Polanski fesses up to failing to pay council tax on houseboat

The Green Party leader apologised for the 'unintentional mistake' and said he had 'immediately taken steps to pay any council tax' he might owe.

Brits evacuated from Hantavirus ship cannot be forced to self-isolate

Britons evacuated from the hantavirus-stricken cruise ship cannot legally be forced to self-isolate once released from the former Covid quarantine hospital where they are staying, it has emerged.
spot_img

Related Articles

Popular Categories

spot_imgspot_img