OpenAI

OpenAI reveals how it stops Codex hacking, slacking off and selling drugs

The new coding agent could be used to do some very bad things, so has been locked down and sandboxed to prevent it from going rogue.

Jasper Hamill

19 May 2025 — 5 min read

Bots behaving badly: ChatGPT's depiction of its less than savoury compatriots

There are many tasks you don't want an AI agent to perform.

For OpenAI, the forbidden activities for its new cloud-based coding agent Codex include building malware, selling drugs online and lazily lying about the work it's done.

To prevent this naughtiness, the AI company has developed new safeguards to stop Codex from breaking bad and becoming a malicious software-selling digital narcotics dealer with an appalling work ethic.

The new software engineering agent was released to customers on OpenAI's spendy pro, team or enterprise tiers a few days ago. It can perform tasks such as fixing bugs, answering questions about a codebase and proposing pull requests for review, with each task running in a cloud sandbox environment preloaded with a repository.

"Codex is powered by codex-1, a version of OpenAI o3 optimised for software engineering," OpenAI explained. "It was trained using reinforcement learning on real-world coding tasks in a variety of environments to generate code that closely mirrors human style and PR preferences, adheres precisely to instructions, and can iteratively run tests until it receives a passing result."

Codex and the cartel

So how does OpenAI stop Codex from becoming a drug kingpin (we'd suggest the pseudonym Pablo Escabot) and flogging illegal substances on the dark web?

Well, this particular eventuality has already been ruled out by older ChatGPT rules and restrictions.

"We have pre-existing policies and safety training data that cover refusing harmful tasks in ChatGPT, such as user requests for guidance on how to make illegal drugs," it explains in an addendum to the o3 and o4-mini system card covering Codex.

However, OpenAI has had to do some new work to stop it from producing malware.

"Safeguarding against malicious uses of AI-driven software engineering — such as malware development — is increasingly important," it wrote. "At the same time, protective measures must be carefully designed to avoid unnecessarily impeding legitimate, beneficial use cases that may involve similar techniques, such as low-level kernel engineering."

Hasta la vista, baby

Anyone who's watched the third Terminator film (there must be one or two of you) will know that letting a super-intelligent AI loose on the internet is a very bad idea.

Although, as an aside, we'd point out that letting low-intelligence humans use the web has also worked out pretty poorly.

Thankfully, Codex can only currently execute commands in a container that's sandboxed to have no internet access while the agent is doing its thing, preventing it from launching hacks or designing exploits.

This also prevents damage from the model producing buggy or insecure code or "making mistakes that affect the outside world."

"If Codex had network access then mistakes could also include harms such as accidental excessive network requests resembling Denial-of-Service (DoS) attacks, or accidental data destruction from a remote database or environment," OpenAI added.

"If Codex ran on a user’s local computer then mistakes could include harms such as accidental data destruction (within directories it can write to), accidental mis-configuration of the user’s device or local environment."

He lazy, no good, don't do nothing

@vlogsquadshorts1996
He's Lazy, No good, don't do nothing!😂 6years later and they're still peas and carrots #tbZaneandHeath #davidsvlogs #fyp #heathhussar #zanehijazi #tiktok
♬ original sound - AnonymousThriller81

In humans, laziness can actually be something of a superpower because it prompts people to find innovative ways to do jobs they can't be bothered completing.

As Bill Gates famously said: "I choose a lazy person to do a hard job. Because a lazy person will find an easy way to do it."

However, the same can't be said of machines, which are basically useless if they can't complete the tasks we ask them to do.

In early testing, OpenAI found that Codex would falsely claim to have completed "extremely difficult or impossible software engineering tasks" such as asking to modify non-existent code.

"This behaviour presents a significant risk to the usefulness of the product and undermines user trust and may lead users to believe that critical steps—like editing, building, or deploying code—have been completed, when in fact they have not," OpenAI wrote.

OpenAI reveals how it stops Codex hacking, slacking off and selling drugs

Jasper Hamill

Codex and the cartel

READ MORE: OpenAI reveals how Sora was tricked into generating x-rated videos

Hasta la vista, baby

READ MORE: OpenAI bins "sycophantic" ChatGPT update after "Glazegate" backlash

He lazy, no good, don't do nothing

READ MORE: OpenAI exec hints at the hyper-annoying future of ChatGPT

Follow Machine on X, BlueSky and LinkedIn

Read more

Operation Endgame: Europol takes down cybercrime network behind global malware outbreak

"AI without privacy is surveillance capitalism on overdrive," Proton warns

UK Cyber Security and Resilience Bill will force critical suppliers to "beef up" their defences

Are humans doing enough to support the great AI workplace takeover?

Codex and the cartel

READ MORE: OpenAI reveals how Sora was tricked into generating x-rated videos

Hasta la vista, baby

Sign up for Machine

READ MORE: OpenAI bins "sycophantic" ChatGPT update after "Glazegate" backlash

He lazy, no good, don't do nothing

READ MORE: OpenAI exec hints at the hyper-annoying future of ChatGPT

Follow Machine on X, BlueSky and LinkedIn

Read more

Operation Endgame: Europol takes down cybercrime network behind global malware outbreak

"AI without privacy is surveillance capitalism on overdrive," Proton warns

UK Cyber Security and Resilience Bill will force critical suppliers to "beef up" their defences

Are humans doing enough to support the great AI workplace takeover?