p(doom)

Is AI scheming against humanity? Not so fast, says UK government as it slams "lurid" claims

Westminster's AI Security Institute claims scary findings about the dark intentions of artificial intelligence have been greatly exaggerated.

Jasper Hamill

11 Jul 2025 — 6 min read

British AI researchers have urged caution around claims that machines are capable of scheming

At the end of last year, OpenAI said that one of its models exhibited "scheming behaviours" and was observed trying to destroy another model to save itself from being overwritten.

Apollo Research, the evaluation organisation OpenAI commissioned to produce that research, also claimed that Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, and Llama 3.1 405B demonstrated similar "in-context scheming capabilities".

Researchers said models were seen introducing subtle mistakes into responses, attempting to switch off oversight mechanisms that control them and even trying to escape the system they lived inside by exfiltrating model weights to external servers.

So are the machines already plotting against us and is p(doom) inevitable? It's a scary thought, if true.

But not so fast.

The British government has stepped in to issue a predictably sober "keep calm and carry on" message, urging caution about warnings that our creations are already conspiring to destroy us.

Last week, no less than 12 government researchers from the UK AI Security Institute, a research organisation within the UK Government’s Department for Science, Innovation, and Technology, published a new paper which said: "Many researchers are worried that [scheming] behaviour heralds a new era in which agents deliberately misrepresent their true capabilities or intentions, which may be misaligned with human values.

"One oft-cited concern is that AI systems with exceptionally powerful reasoning skills could wrest control from people, posing catastrophic risks to humanity. This research has been picked up (often in lurid terms) by the media, is endorsed by prominent figures in AI research and development, and has the capacity to have a significant impact on policy. It is thus particularly important that claims about AI scheming are defensible."

Monkey business in AI research

The distrustful dozen compared the current research to previous investigations into whether non-human apes can learn language. Spoiler: they can't.

"There is much to learn from this earlier endeavour, which generated great excitement, but ultimately failed because of researcher bias, a lack of rigour in scientific practice, and a failure to clarify what would constitute evidence for the phenomenon under study," the team wrote.

"Whilst recognising that early release of preliminary findings can sometimes be useful, we call researchers studying AI ‘scheming’ to minimise their reliance on anecdotes, design research with appropriate control conditions, articulate theories more clearly, and avoid unwarranted mentalistic language.

"Our goal here is not to dismiss the idea that AI systems may be ‘scheming’ or even that they might pose existential risks to humanity. On the contrary, it is precisely because we think these risks should be taken seriously that we call for more rigorous scientific methods to assess the core claims made by this community."

The truth about Machiavellian AI models?

The British team admitted that the concept of AI models autonomously pursuing malicious goals that are misaligned with human interests is "concerning".

However, reports about "Skynet" taking over are often greatly exaggerated, they warned.

The team highlighted four ways in which current research into scheming models is flawed.

1) Evidence is anecdotal: British researchers said many papers published on the topic of scheming were not peer-reviewed and featured shaky science. Although the studies resulted in headlines like "This is how AI will destroy humanity", the actual experiments often involved a great deal of prompting, cajoling and persuasion from researchers keen to make their models appear as scary as possible, the researchers claimed. This means the studies are not proof of genuine scheming behaviour.

2) Bad experimental practice: The UK AI Security Institute said studies of scheming "often lack hypotheses and control conditions". This means research is "descriptive", which means it does not formally test a hypothesis by comparing treatment and control conditions. "The upshot of many studies is that 'models sometimes deviate from what we consider perfectly aligned behaviour'," they wrote. "Perfect behaviour is not an adequate null hypothesis, because stochasticity introduced by idiosyncrasies in the inputs, or randomness in the outputs, can lead to less-than-perfect behaviour even in the absence of malign intent."

3) Studies have "weak or unclear theoretical motivation": The team said studies into ape language were held back by a "‘know-it-when-you-see-it’ logic". In other words, scientists assumed natural language would be recognisable when it was observed, rather than strictly specifying what they were looking for. Scheming is similarly ill-defined, they argued, meaning that researchers don't have a commonly held agreement on what constitutes his behaviour and how it should be defined during experiments. There is also a concern that studies are deliberately set up to evoke scenarios which "sound menacing to human readers" but in fact do not show anything conclusive, let alone terrifying, about AI behaviour.

4) Findings are exaggerated: Let's not forget that AI models are machines, not living, conscious beings (yet). AI scheming papers often describe the behaviour of AI using mentalistic language, which (thanks for the explanation) implies models have goals, beliefs, and preferences. This anthropomorphication reduces the reliability of experiments. For instance, one study by a big AI firm found that AI models can fake their alignment by "pretending" to follow the training objective. However, AI cannot really pretend to do anything.

p(doom) postponed

So that you have it. There is probably no conclusive proof that AI models are scheming against us. At least not according to the UK AI Security Institute. And, of course, for the time being, because none of us truly knows what horrors are lurking on the horizon.

"Many of the research practices adopted thus far are not sufficiently rigorous to allow strong claims either way about whether current AI systems can ‘scheme’," the UK AI Security Institute concluded.

The group recommended that researchers avoid making strong claims based on anecdotal evidence, use appropriate control conditions, rigorously define the theory they are testing and avoid mentalistic language.

For more lurid coverage of the AI revolution, stay tuned to Machine.

Read the full paper: Lessons from a Chimp: AI ‘Scheming’ and the Quest for Ape Language

Do you have a story or insights to share? Get in touch and let us know.

Is AI scheming against humanity? Not so fast, says UK government as it slams "lurid" claims

Jasper Hamill

Monkey business in AI research

READ MORE: OpenAI reveals bid to mitigate "catastrophic" chemical, biological and nuclear risk

READ MORE: Anthropic observes AI faking its "alignment" to deceive humans in ominous world-first experiment

The truth about Machiavellian AI models?

READ MORE: Elon Musk makes frightening AI p(doom) apocalypse prediction

p(doom) postponed

Follow Machine on X, BlueSky and LinkedIn

Read more

Bureaucracy and the bomb: Britain's nuclear weapons manufacturer has an ESG programme

Marco Rubio was AI cloned: How can you avoid the same fate and stay safe from voice spoofing?

UK building autonomous AI spy network to tackle small boats crisis

OpenAI reveals bid to mitigate "catastrophic" chemical, biological and nuclear risk