5.8 C
London
Tuesday, April 21, 2026

AI is just one year away from beating ‘Humanity’s Last Exam’

AI will be ready to score full marks on one of the world’s most challenging knowledge tests branded Humanity’s Last Exam (HLE) in a matter of months, developers claim.

HLE was set up by tech bosses to see just how intelligent their systems are and consists of 2,500 meticulously chosen questions, spanning around a hundred topics from rocket science and mythology to physiology.

Each one requires at least PhD levels of understanding and to achieve a score even close to 100 per cent would earn someone the title of a ‘universal expert’.

Just two years ago, the much-vaunted ChatGPT system from OpenAI scored a measly 3 per cent on the exam with its rivals at Google and Anthropic not doing much better.

The test served to assuage fears over the growing dominance of AI, with researchers claiming it proved ‘a marked gap’ remained between large language models (LLMs) and the world’s finest academics.

But the seemingly impossible HLE may prove to be just another milestone in AI’s unstoppable rise. 

Google Gemini scored an impressive 45.9 per cent on the exam last month having soared to a score of 18.8 per cent within months of its first attempt.

And full marks are on the horizon, according to Calvin Zhang, the research lead at Scale, the AI company behind HLE.

AI will be ready to score full marks on one of the world's most challenging knowledge tests branded Humanity's Last Exam (HLE) in a matter of months, developers claim (Stock Photo)

‘We wanted to create this close-ended academic benchmark, set to the frontier of expert humans, that only a handful of people on earth can really solve,’ he said.

‘We’ve seen over the past few years insane progress on these language models. It’s impressive, model builders have really done a great job at improving these reasoning models.’

Kate Olszewska, a product manager at Google DeepMind added: ‘If we truly cared about this as the only thing in life, I think we could get to it pretty quickly.’ 

Anthropic – the company behind the Claude AI system – has achieved a score of 34.2 per cent in HLE and is improving its marks at a rapid pace.

AI returning a score of 100 per cent in the exam would be a significant development given the test is ‘designed to be the final closed-ended academic benchmark of its kind’, according to its authors.

It means that if the technology cracks the HLE, it will need to be tested on questions no human knows the answer to in future.

The test was created by researchers at Scale and the Center for AI Safety, a non-profit organisation, to examine both the AI’s breadth of knowledge and its depth of reasoning.

Experts from roughly 50 countries submitted 70,000 questions for consideration in response to a global appeal in September 2024 which offered a $500,000 prize pot.

They had to require a short unambiguous answer and be difficult to find on the internet.

The list was whittled down to 13,000 after questions which any existing model could answer were removed from consideration.

Some of the 2,500 that were chosen have since been removed or edited following feedback from users. 

They require a wide-range of expertise – from knowledge of biology to proficiency in languages – and a large number of them have remained secret in a bid to stop systems benefiting from answers being publicly discussed online.

Success in HLE would evoke memories of IBM’s supercomputer Deep Blue defeating world chess champion Garry Kasparov in a game in 1997, confounding most experts’ predictions.

Since then, a string of major AI benchmarks have been cleared including the multi-disciplinary Massive Multitask Language Understanding, released in 2020, which was canned after systems began finding it too easy, often scoring above 90 per cent.

As AI approaches the stage where it can master human-made tests, expanding beyond the existing limits of human knowledge has increasingly become the main focus of developers, Ms Olszewska added.

But there will always be room for human specialism, according to Zhang, with physical fields such as surgery, as well as decision-based skills including judgment and creativity harder for AI to master. 

Hot this week

Diana’s ex-hairdresser condemns ‘evil’ comments about Kate’s hair

Princess Diana's former hairdresser has condemned 'nasty' comments made about the Princess of Wales 's hair - as she stepped out with her newly blonde tresses.

The unusual breakfast request Princess Lilibet asks Meghan Markle for

Meghan Markle revealed her children's favourite meals and that she 'doesn't like baking' on the second season of her lifestyle show With Love, Meghan.

Experts reveal how many tins of tuna is safe to eat a week

The NHS advises people to eat at least two portions of fish a week, yet a recent investigation revealed toxic metals, including mercury, could be lurking in cans of tinned tuna sold in the UK.

Some people DO see ghosts – and medics say there’s an explanation

An astonishing third of people in the UK and almost half of Americans say they believe in ghosts, spirits and other types of paranormal activity.

Prince Philip’s nickname only his nearest and dearest could call him

From 'Lillibet' to 'Grandpa Wales', members of the Royal Family are known to go by many nicknames.

Simone Biles sparks privacy fears with desperate plea to fans

The USA hero, 29, has become a household name thanks to her seven gold medals on the Olympic stage. However, Biles admitted that her fans sometimes make her anxious.

Emmerdale’s Kelsey-Beth Crossley marries partner in Blackpool wedding

Emmerdale star Kelsey-Beth Crossley has revealed she has married her boyfriend Matt Blinkhorn in an swanky Blackpool wedding over the weekend. 

Couple order ‘bargain’ garden arch… but it was too good to be true

The pair wanted the garden arch to grow some flowers around but grew suspicious because of the small size of the parcel when it arrived on April 13.

Couple order ‘bargain’ garden arch… but it was too good to be true

The pair wanted the garden arch to grow some flowers around but grew suspicious because of the small size of the parcel when it arrived on April 13.

New ‘Hollywood dose’ pill: A-listers hooked on ‘youth elixir’

Insiders reveal that a growing number of A-listers are turning to a tiny pill...

Harry Styles and Zoe Kravitz ‘aren’t staying together during UK stay’

Harry Styles and his girlfriend Zoe Kravitz are reportedly not staying at the same accommodation together during her UK visit this week.

Harry Styles and Zoe Kravitz ‘aren’t staying together during UK stay’

Harry Styles and his girlfriend Zoe Kravitz are reportedly not staying at the same accommodation together during her UK visit this week.

Couple order ‘bargain’ garden arch… but it was too good to be true

The pair wanted the garden arch to grow some flowers around but grew suspicious because of the small size of the parcel when it arrived on April 13.
spot_img

Related Articles

Popular Categories

spot_imgspot_img