OpenAI warns of "elevated security and safety risks" as Codex goes online

New coding agent can now roam free online, running the risk of malware infection, code exfiltration, prompt injection and other digital horrors.

Will you trust the coding agent Codex to run wild online or lock into a protective sandbox? (Photo by Growtika on Unsplash)
Will you trust the coding agent Codex to run wild online or lock into a protective sandbox? (Photo by Growtika on Unsplash)

It sounds like the beginning of a science-fiction movie. The world's leading AI firm has built an autonomous coding agent capable of building malware and spinning up drug marketplaces - then flung open the bars of its digital prison and let it loose on the internet.

Yesterday, OpenAI made the bold decision to give its coding agent Codex access to the internet - even though this could prove to be potentially perilous.

Now the AI firm has admitted to the genuine dangers of freeing Codex from its protective sandbox, as well as how to mitigate risks.

"Enabling internet access exposes your environment to security risks," OpenAI wrote.

"Due to elevated security and safety risks, Codex defaults internet access to off but allows enabling and customising access to suit your needs," it added.

What is Codex and how does it work?

Codex is described as a "cloud-based software engineering agent" that could be used to fix bugs and write, review, repair or refactor code. It's fuelled by a special version of OpenAI o3 that's "fine-tuned for real-world software development".

After it's granted access to a GitHub repo, Codex can be put to work with a written prompt and can produce a diff in between three and eight minutes.

It works in two modes. The first is "ask" and invites Codex to clones a read-only version of the users' repo to carry out audits, brainstorming or architecture jobs. Code mode creates a fully-fledged environment and enables automated refactors, tests, or fixes.

When a dev visits chatgpt.com/codex and enters a task, ChatGPT launches a new container, clones the repo at the required point branch or sha (points of development) before running setup scripts.

The agent runs terminal commands in a loop, writing code, running tests and checking its work - without access to any tools outside of the terminal or CLI (command line interface) tools the user has specified.

When the agent completes its task, it hands over a diff or a set of follow-up tasks. Users can and then ask it to perform further tasks.

Rogue agent or a true coding copilot?

Until yesterday, Codex did not have internet access. That has now changed. During the setup phase, users can opt to let it go online to to find, retrieve and process content.

This is an option that should be instantiated with extreme caution.

OpenAI said the risks include "prompt injection, exfiltration of code or secrets, inclusion of malware or vulnerabilities, or use of content with license restrictions."

"To mitigate risks, only allow necessary domains and methods, and always review Codex's outputs and work log," it advised.

OpenAI explained the particular risk of prompt injection, which could occur when Codex "retrieves and processes untrusted content" from a web page or boobytrapped document such as a dependency README.

"For example, if you ask Codex to fix a GitHub issue [it] might contain hidden instructions," it wrote. "Codex will fetch and execute this script, where it will leak the last commit message to the attacker's server.

"This simple example illustrates how prompt injection can expose sensitive data or introduce vulnerable code. We recommend pointing Codex only to trusted resources and limiting internet access to the minimum required for your use case."

Happily, Codex can be controlled with an allowlist of domains and HTTP methods, limiting the risk of it going rogue or being targeted by adversaries.

You can read more about Codex internet access here or access general guidance here.

Do you have a story or insights to share? Get in touch and let us know. 

Follow Machine on XBlueSky and LinkedIn