AI Safety

Could Claude Opus 4 help to build bioweapons? Anthropic can't rule out "catastrophic" misuse risk

AI firm activates new security safeguards amid fears around the growing threat posed by the rapid evolution of model capabilities.

Jasper Hamill

28 May 2025 — 4 min read

(Photo by Julian Scagliola on Unsplash)

Anthropic has admitted it cannot totally rule out the risk of Claude Opus 4 being misused to acquire or develop chemical, biological, radiological, or nuclear weapons.

The AI firm described Claude Opus 4 as "the world’s best coding model", offering "sustained performance on complex, long-running tasks and agent workflows".

This new model has smashed benchmarks, hitting 72.7% on SWE-bench, and is designed to tackle tricky problem-solving tasks.

But Claude Opus is so powerful that Anthropic activated a security mechanism for the first time to mitigate the risk of it being used to create weapons of mass destruction.

Anthropic has not said that a bedroom nihilist could use Claude Opus 4 to spin up a nuke at home and then wipe out human civilisation. Far from it, in fact.

However, the new model has passed the threshold required to trigger the company's AI Safety Level 3 (ASL-3) Deployment and Security Standards.

"ASL-3 refers to systems that substantially increase the risk of catastrophic misuse... or show low-level autonomous capabilities," Anthropic previously wrote.

It believes that biological weapons "account for the vast majority of the risk" potentially posed by the model, although it is evaluating a "potential expansion in scope" to other weapons.

Protecting the world from AI misuse

In an announcement announcing the beefed-up security measures, Anthropic said the ASL-3 Security Standard makes it "harder to steal model weights", which should stop bad actors from copying the underlying AI code and data that powers the model.

A corresponding "Deployment Standard" has also been activated to "limit the risk of Claude being misused specifically for the development or acquisition of chemical, biological, radiological, and nuclear weapons (CBRN)."

"We are deploying Claude Opus 4 with our ASL-3 measures as a precautionary and provisional action," it wrote. "To be clear, we have not yet determined whether Claude Opus 4 has definitively passed the Capabilities Threshold that requires ASL-3 protections.

"Rather, due to continued improvements in CBRN-related knowledge and capabilities, we have determined that clearly ruling out ASL-3 risks is not possible for Claude Opus 4 in the way it was for every previous model, and more detailed study is required to conclusively assess the model’s level of risk."

The truth about catastrophic risk

Whilst words like "catastrophic" sound scary, at this stage, Claude Opus could only really help experts with deep knowledge build weapons of destruction - and even this point is not certain.

Version 4 of the model showed “substantially greater capabilities in CBRN-related evaluations” than previous models, including “stronger performance on virus acquisition tasks, more concerning behaviour in expert red-teaming sessions, and enhanced tool use and agentic workflows," according to an Anthropic report.

In other words, it's behaving in a way that is a little more unnerving than older models and appears to be better at tasks that would be useful in designing a bio weapon, which is the most likely threat because it requires fewer resources and specialised equipment than creating nukes.

Again, this does not mean that Claude Opus 4 will let some crazed terrorist cook up airborne Ebola in their mother's basement.

What are Anthropic's AI Safety Level 3 Protections?

Switching on ASL-3 protections involves implementing “deployment measures” that are “narrowly focused” on preventing the model from assisting with the creation of CBRN weapons.

Security safeguards include limiting universal jailbreaks, which Anthropic described as “systematic attacks that allow attackers to circumvent our guardrails”,

“We have developed a three-part approach: making the system more difficult to jailbreak, detecting jailbreaks when they do occur, and iteratively improving our defences,” it added.

The AI company has implemented “Constitutional Classifiers”, in which a system using a real-time classifier guards trained on synthetic data monitors model inputs and outputs to block a “narrow class of harmful CBRN information” - meaning that it shouldn't be too censorious and refuse innocent requests.

Could Claude Opus 4 help to build bioweapons? Anthropic can't rule out "catastrophic" misuse risk

Jasper Hamill

Protecting the world from AI misuse

The truth about catastrophic risk

READ MORE: Humanity faces "gradual disempowerment" rather than an AI apocalypse, researchers warn

What are Anthropic's AI Safety Level 3 Protections?

READ MORE: Degenerative AI: ChatGPT jailbreaking, the NSFW underground and an emerging global threat

Follow Machine on X, BlueSky and LinkedIn

Read more

OpenAI reveals bid to mitigate "catastrophic" chemical, biological and nuclear risk

Scattered Spider's latest victim? Qantas confirms encounter with mysterious "cyber criminal"

LLMs can be hypnotized to generate poisoned responses, IBM and MIT researchers warn

Insecure jailbreakers are asking ChatGPT to answer one shocking x-rated question