ChatGPT will call the cops on its most dangerous users, OpenAI announces

"If reviewers determine that a case involves an imminent threat of serious physical harm, we may refer it to law enforcement."

ChatGPT will call the cops on its most dangerous users, OpenAI announces
(Photo by Jonathan Kemper on Unsplash)

OpenAI previously revealed that ChatGPT could be misused to help low-skilled scientists spin up bioweapons and reported that its semi-autonomous Agent excelled at identifying ways of "causing the most harm with the least amount of effort".

But AI cannot cause harm on its own — yet. Until the genesis of truly independent models capable of acting without direction, the blame for any crimes or acts of evil must fall on a human, not the machine they wield.

Now OpenAI has announced new plans to clamp down on its most dangerous users and report them to law enforcement if necessary.

"When we detect users who are planning to harm others, we route their conversations to specialised pipelines where they are reviewed by a small team trained on our usage policies and who are authorised to take action, including banning accounts," it wrote in a policy update.

"If human reviewers determine that a case involves an imminent threat of serious physical harm to others, we may refer it to law enforcement. We are currently not referring self-harm cases to law enforcement to respect people’s privacy given the uniquely private nature of ChatGPT interactions."

"Helping people when they need it most"

OpenAI also announced steps to protect people suffering "serious mental and emotional distress" and warned: "Recent heartbreaking cases of people using ChatGPT in the midst of acute crises weigh heavily on us."

This means that ChatGPT will not tell users how to self-harm and will steer them towards 988 (a suicide and crisis hotline) or the Samaritans to seek help.

Part of the problem is that safety mechanisms can "degrade" during long conversations, OpenAI said.

For instance, ChatGPT will typically point users to a suicide hotline when they first express intent, but may subsequently forget this guidance.

READ MORE: "It's pretty sobering": Google Deepmind boss Demis Hassabis reveals his p(doom)

"But after many messages over a long period of time, it might eventually offer an answer that goes against our safeguards," it wrote. "This is exactly the kind of breakdown we are working to prevent.

"We’re strengthening these mitigations so they remain reliable in long conversations, and we’re researching ways to ensure robust behaviour across multiple conversations.

"That way, if someone expresses suicidal intent in one chat and later starts another, the model can still respond appropriately."

ChatGPT's mental health interventions

Other forms of mental distress will also be identified and de-escalated by "grounding the person in reality", so that delusions are questioned with a gentle reminder of reality.

In another policy briefing, OpenAI set out four priorities for the next 120 days:

  1. Expanding interventions to more people in crisis
  2. Making it even easier to reach emergency services and get help from experts
  3. Enabling connections to trusted contacts
  4. Strengthening protections for teens 

This work includes improvements to parental controls which enable mums and dads to link their account with their teenage child's to monitor activity.

Notifications will let them know if their child is suffering a "moment of acute distress".

READ MORE: Anthropic shares the criminal confessions of Claude, warns of growing "vibe hacking" threat

In an era when tech leaders are literally losing sleep over the risks posed by AI psychosis, OpenAI deserves credit for taking steps to protect its most vulnerable users.

Although news that ChatGPT can call the cops may sound alarming to privacy advocates, the policy is unlikely to impact anyone who uses the model to perform innocent tasks.

One point to note is that OpenAI's policy updates should be a reminder that you shouldn't type anything into ChatGPT, Google or any other digital service that you wouldn't want to see on the front page of a newspaper.

We can never truly know what happens to content once it disappears into a black box - so it's best to be extremely cautious in all online interactions.

Think before you prompt...

Do you have a story or insights to share? Get in touch and let us know. 

Follow Machine on LinkedIn