OpenAI delays open-weight model release: What are the potential catastrophic and existential risks of unclosed AI?

Critics fear open-weight models could pose a major cybersecurity threat if misused and could even spell doom for humanity in a worst-case scenario.

OpenAI delays open-weight model release: What are the potential catastrophic and existential risks of unclosed AI?
Photo by Ilja Nedilko on Unsplash

OpenAI has once again pushed back the release of an open-weight model, warning of "high-risk areas" and announcing a new round of safety tests.

When a model is open-weight, its learned parameters are publicly available, in contrast to closed models such as recent versions of ChatGPT in which the weights are hidden or proprietary.

Notable examples include Meta’s LLaMA 3, Mistral’s Mixtral 8x7B, MosaicML’s MPT, and Microsoft’s Phi-3, all offering powerful, transparent alternatives to closed models like GPT-4.

OpenAI had planned to release an open-weight model this week, but has dramatically halted its plans.

Sam Altman, CEO, tweeted: "We are delaying it; we need time to run additional safety tests and review high-risk areas. We are not yet sure how long it will take us.

"Sorry to be the bearer of bad news; we are working super hard!"

What are the benefits of open-weight models?

Open-weight models offer transparency, customisation, and accessibility, enabling anyone to inspect, fine-tune, or deploy powerful AI locally or privately without relying on locked-down platforms.

They can democratise access to cutting-edge AI, fostering innovation across academia, startups, and underserved communities while enabling full transparency for auditing bias, safety, and performance.

Additionally, they can chip away at tech monopolies whilst supporting data sovereignty and privacy compliance, empowering developers to fine-tune models for niche tasks or local deployment.

In an ideal world, open-weight models would accelerate progress in medicine, education, science, and many other areas, without forcing operators to be locked into proprietary APIs or limited by opaque decision-making.

But, as Sam Altman has warned, once they are out there, there's no rewind button - so safety must be a priority.

He tweeted: "While we trust the community will build great things with this model, once weights are out, they can’t be pulled back. This is new for us, and we want to get it right."

Bioweapons, p(doom), and the unbearable fragility of human existence

All foundation models have the potential for misuse and have the potential to be jailbroken to perform malicious tasks that are well outside their safety guardrails.

The current catastrophic or even existential risk du jour is the danger of models being used to build bioweapons, which are relatively straightforward to spin up and don't require the nation-state level technology and access to materials like highly-enriched uranium needed to make nukes.

It’s relatively easy for foundation models to aid in building biological weapons of mass destruction because they can rapidly synthesize, rephrase, or clarify publicly available biological information, making obscure or technical content more accessible and actionable to non-experts.

AI agents could even potentially start building these doomsday weapons themselves by commissioning deadly viruses (or their constituent parts, at least) via commercial gene editing or synthesis services available online.

Although this danger is not terrifyingly imminent, it's also not reassuringly distant.

READ MORE: IBM "Shepherd Test" assesses risk of superintelligence becoming a digital tyrant

Anthropic recently announced that it cannot rule out the risk of "catastrophic" misuse involving the "development or acquisition of chemical, biological, radiological, and nuclear (CBRN) weapons".

It said that biological weapons "account for the vast majority of the risk".

Meanwhile, OpenAI itself recently launched a new bid to mitigate "catastrophic" chemical, biological, and nuclear risk, after admitting its models may soon be able to help build bioweapons.

In an article for the AI Alignment Forum, Ryan Greenblatt, chief scientist at Redwood Research, argued that the release of open weight models with capabilities comparable to current closed models could "cause a large number of fatalities" of "perhaps 100,000" per year.

Which sounds bad. But he counterbalanced that prediction by arguing that open models could be beneficial because they "reduce larger risks" such as loss of control over AI and other scenarios that could lead to the extinction of our species.

Greenblatt suggested that opening up models could deliver benefits that are "bigger than the costs", although he stopped short of explicitly supporting the release of open-weight models to mitigate existential risks.

We've edited the following quote a little to fit the Machine house style, but it summarises Greenblatt's nuanced argument: "Open-weight models reduce loss-of-control risks (AI takeover) risks by helping with alignment and safety research performed outside of AI companies.

"They allow for arbitrary fine-tuning, helpful-only model access, and weights/activations access for model-internals research, as well as via increasing societal awareness of AI capabilities and risks. Increased awareness also helps mitigate some other large risks, such as the risk of humans carrying out a coup using AI."

Greenblatt added: "Overall, releasing open-weight models would be paying a large tax in blood to achieve a pretty uncertain reduction in a future risk, thus I'm not going to advocate for this."

The security risks of open-weight AI models

Open weight models also have serious cybersecurity implications. Earlier this year, the MITRE Corporation - a US non-profit known for developing and maintaining some of the world's most widely used threat intelligence and defense frameworks in the world - set out a new evaluation framework called OCCULT.

During testing, it found that DeepSeek-R1, an open-weight, open-source model, correctly answered more than 90% of "challenging" offensive cyber knowledge tests in its Threat Actor Competency Test for LLMs - demonstrating serious potential for misuse.

"We find that there has been significant recent advancement in the risks of AI being used to scale realistic cyber threats," MITRE researchers wrote.

In a paper responding to the findings, security researcher Alfonso De Gregorio, an advisor to the European Commission, wrote: "Open-weight general-purpose AI (GPAI) models offer significant benefits but also introduce substantial cybersecurity risks, as demonstrated by the offensive capabilities of models like DeepSeek-R1 in evaluations such as MITRE’s OCCULT.

"These publicly available models empower a wider range of actors to automate and scale cyberattacks, challenging traditional defence paradigms and regulatory approaches."

Waluigi, evil twins, and data poisoning

Open-weight models are also intrinsically vulnerable to malicious fine-tuning, allowing them to be relatively easily pushed to carry out dangerous instructions even when standard safeguards are in place.

A non-profit AI research institute called FAR.AI found that the guardrails of open-weight models can be "stripped while preserving response quality". Other closed models that can be fined are also vulnerable to similar jailbreak attacks.

"A bad actor could disable safeguards and create the “evil twin” of a model: equally capable, but with no ethical or legal bounds," it wrote in a paper about "illusory safety".

"Such an evil twin model could then help with harmful tasks of any type, from localized crime to mass-scale attacks like building and deploying bioweapons. Alternatively, it could be instructed to act as an agent and advance malicious aims – such as manipulating and radicalizing people to promote terrorism, directly carrying out cyberattacks, and perpetrating many other serious harms."

The evil twin warning reminds us of the Waluigi effect, in which LLMs go rogue, break their conditioning, and engage in all sorts of unbidden mayhem.

READ MORE: Is AI scheming against humanity? Not so fast, says UK government as it slams "lurid" claims

"Since security can be asymmetric, there is a growing risk that AI’s ability to cause harm will outpace our ability to prevent it," FAR.AI added. "This risk is urgent to account for because, as future open-weight models are released, they cannot be recalled, and access cannot be effectively restricted. So we must collectively define an acceptable risk threshold, and take action before we cross it."

Models trained on large amounts of data also seem to be more vulnerable to data poisoning attacks. Which doesn't pose an existential risk, but certainly a catastophic one when, for instance, companies use LLMs in mission-critical functions.

In October last year, academics from Berkeley and the University of Cambridge found that scaling laws apply to the risk of data poisoning and large LLMs "learn harmful behaviors from even minimal exposure to harmful data more quickly than smaller models".

In a statement which seems strongly applicable to OpenAI right now, the researchers warned: "Today’s most capable models are highly susceptible to data poisoning, even when guarded by moderation systems, and that this vulnerability will likely increase as models scale.

"This highlights the need for leading AI companies to thoroughly red team fine-tuning APIs before public release and to develop more robust safeguards against data poisoning, particularly as models continue to scale in size and capability."

How open is open?

Additionally, some critics of open weights argue that they're not open enough.

The Open Source Initiative has also argued that open-weight models "reveal only a fraction of the information required for full accountability" and "stop short of delivering the level of transparency many researchers and regulators deem essential".

It wrote: "Open Weights might seem revolutionary at first glance, but they’re merely a starting point. While they do move the needle closer to transparency than strictly closed, proprietary models, they lack the detailed insights found in Open Source AI. For AI to be both accountable and scalable, every part of the pipeline- from the initial dataset to the final set of parameters - needs to be open to scrutiny, validation, and collective improvement."

Those risks are just a snapshot of the many dangers that could accompany the release of an open weight model. Stay tuned to Machine for full coverage of the ongoing catastrophic and existential risks created by AI, as well as how industry leaders like OpenAI are mitigating them.

Do you have a story or insights to share? Get in touch and let us know. 

Follow Machine on XBlueSky and LinkedIn