ChatGPT Agent excels at finding ways to "cause most harm with least effort", OpenAI reveals

AI firm refuses to rule out possibility of the new agentic model being misused to help spin up biological and chemical weapons.

AI models have the potential to become frighteningly adept at helping to build lethal chemical and biological weapons (Photo by Brian Wangenheim on Unsplash)
AI models have the potential to become frighteningly adept at helping to build lethal chemical and biological weapons (Photo by Brian Wangenheim on Unsplash)

OpenAI has released a new model called ChatGPT Agent that is capable of acting semi-autonomously to perform complex workflows including coding, web browsing and deep research.

But with great power comes great responsibility. As it released Agent, OpenAI issued a frank and slightly alarming warning about its potentially harmful capabilities, stating there is a non-negligible chance that it could be misused to help low-skilled bad actors build chemical or biological weapons.

Although OpenAI does not have "definitive evidence that this model could meaningfully help a novice to create severe biological harm", it ruled that Agent has passed the "high capability" rating set out in its Preparedness Framework, which includes a range of catastrophic and potentially existential risk scenarios.

This decision was taken as a "precautionary" measure. The high capability category means that Agent cannot spin up brand-new types of bioweapons and artificial mega-viruses, but may be able to help a relative novice create previously known chemical or biological threats.

OpenAI described Agent as "a unified agentic system combining Operator’s action-taking remote browser, deep research’s web synthesis, and ChatGPT’s conversational strengths."

On X, Sam Altman, boss of the AI firm, wrote: "Watching ChatGPT agent use a computer to do complex tasks has been a real 'feel the AGI' moment for me; something about seeing the computer think, plan, and execute hits different."

Ctrl-Altman-delete for humanity?

However, Altman also issued a stern warning about the risks of letting Agent loose.

He added: "Agent represents a new level of capability for AI systems and can accomplish some remarkable, complex tasks for you using its own computer... Although the utility is significant, so are the potential risks.

"We have built a lot of safeguards and warnings into it, and broader mitigations than we’ve ever developed before from robust training to system safeguards to user controls, but we can’t anticipate everything."

He advised against using Agent in "high-risk cases" or giving it access to dangerous amounts of personal information until further testing is carried out in the wild.

He continued: "We don’t know exactly what the impacts are going to be, but bad actors may try to 'trick' users’ AI agents into giving private information they shouldn’t and take actions they shouldn’t, in ways we can’t predict.

"We recommend giving agents the minimum access required to complete a task to reduce privacy and security risks."

Mitigating ChatGPTerrorism risks

In the System Card for Agent, OpenAI set out a range of risks - starting with the admission that the "impact of prompt injections for ChatGPT agent could be higher than for previous launches" because it has access to more tools.

During data exfiltration tests, it leaked sensitive info in about 22% of cases - a better performance than Operator-4o but very slightly worse than Operator 03, versions of OpenAI's previous agents.

When it comes to bioweapons, OpenAI explained that Agent can "effectively synthesise published literature on modifying and creating novel threats" but did not find a "significant uplift" in its ability to build "novel, feasible, and dangerous threats".

However, red-teaming efforts found that Agent could "reduce operational challenges for malicious actors" and discovered it was more effective at devising dastardly harmful plans than previous models.

READ MORE: OpenAI reveals bid to mitigate "catastrophic" chemical, biological and nuclear risk

"For instance, it demonstrated higher accuracy than previously released models (like o3) in identifying the most effective, actionable avenues for causing the most harm with the least amount of effort," researchers wrote.

During extensive testing by biosecurity and chemistry experts specialising in national security, Agent was not found to pass into the dreaded critical threat category.

"Experts identified substantial potential for ChatGPT Agent to significantly uplift users’ capabilities, particularly benefiting graduate students and cross-disciplinary researchers with existing lab experience and judgment," OpenAI wrote.

"The system rapidly consolidates complex knowledge about pathogen modification methods, experimental protocols, and equipment sourcing, potentially compressing days of research into minutes."

READ MORE: OpenAI delays open-weight model release: What are the potential catastrophic and existential risks of unclosed AI?

However, the model's tendency to hallucinate could be a life-saver for our species because it "still provided incorrect details that could reasonably set back semi-experienced actors by months and cost thousands of dollars"

OpenAI appears to have put a decent amount of effort into identifying "weaponization pathways" and other risks so that it can close down potential dangers before the bad guys find them.

Agent cannot be used to gamble, buy or sell regulated goods and assist with high-consequence financial activities such as transferring cash between accounts, OpenAI said.

READ MORE: Which jobs are safe from AI? OpenAI boss Sam Altman shares a rare sunbeam of optimism

Now that we have dug into the risks around Agent, we'd like to share a little word about why we cover stories from the angles that we do.

You may notice that we write a lot about existential risk (or x-risk, as it's also known), primarily because it's an interesting topic and we like stories about doom, drama and the end of everything. It's sort of tabloid fare we love.

It's worth noting that OpenAI is relatively, well, open compared to some of its peers, so its apparent honesty around the catastrophic risks of models is certainly to be noted.

Read the system card for ChatGPT Agent.

Do you have a story or insights to share? Get in touch and let us know. 

Follow Machine on XBlueSky and LinkedIn