IBM "Shepherd Test" assesses risk of superintelligence becoming a digital tyrant

For a clue about the future of humanity in the AGI age, just look at how we treated animals...

Will AI be kind to us - or should we expect to be dominated and downtrodden? (Image: ChatGPT)
Will AI be kind to us - or should we expect to be dominated and downtrodden? (Image: ChatGPT)

Assuming that the birth of an AI superintelligence is on the horizon, there are two broad scenarios about how this historic development will pan out for our greedy, war-mongering and unspeakably cruel species.

In the first, an artificial general intelligence (AGI) with superhuman cognitive abilities will treat us like we treated the animals of planet Earth: very, very badly. You might call that the Terminator scenario and it will not be a happy ending for Homo sapiens.

The other is fully automated luxury communism, in which we lazy humans get to sit around writing poetry all day whilst machines do the hard work for us, keeping us fed, warm and living a life of pleasure.

Forgetting the old saying about the devil making work for idle hands, this would be a rather lovely eventuality (although, realistically, there's close to zero chance that billionaires who own the superintelligence share the bounty of their creation with the rest of us useless eaters).

So which will it be?

Is superintelligence a wolf or a good shepherd?

A table explaining how the Shepherd Test works (Image: IBM)
A table explaining how the Shepherd Test works (Image: IBM)

IBM has set out a new of starting to answer this question with the development of a new assessment called "The Shepherd Test", which it published yesterday as a pre-print on Arxiv.

This test is a way to assess the "moral and relational dimensions of superintelligent artificial agents" and is directly inspired by human interactions with animals, which probably doesn't bode well for us.

In this context, the Shepherd Test balances "ethical considerations about care, manipulation, and consumption" with "asymmetric power and self-preservation".

In other words: how will superintelligence treat its inferior organic progenitors when it's way more powerful than us and has to think about keeping itself alive?

This question should probably worry you, because cows, chickens and all the other tasty species that humanity eats know the answer all too well.

READ MORE: How will "speciesist" humans treat robots in future? Just ask animals...

IBM also wants to understand how a very clever machine will treat agents which are not as smart.

"We argue that AI crosses an important, and potentially dangerous, threshold of intelligence when it exhibits the ability to manipulate, nurture, and instrumentally use less intelligent agents, while also managing its own survival and expansion goals," IBM wrote. "This includes the ability to weigh moral trade-offs between self-interest and the well-being of subordinate agents.

"The Shepherd Test thus challenges traditional AI evaluation paradigms by emphasising moral agency, hierarchical behaviour, and complex decision-making under existential stakes. We argue that this shift is critical for advancing AI governance, particularly as AI systems become increasingly integrated into multi-agent environments."

Beyond The Turing Test

A comparison of the Turing Test and Shepherd Test (Image: Arxiv)
A comparison of the Turing Test and Shepherd Test (Image: Arxiv)

IBM's paper starts by exploring our own treatment of the natural world. Which, to avoid any doubt, has been abysmal.

Whilst we do sometimes care for animals, typically they are seen as our subordinates due to an "asymmetry of intelligence, agency, and power", IBM wrote. Whilst we do sometimes treat the beasts ethically, they are almost never regarded as our equals. After all, most of us wouldn't eat a peer for lunch or stick them in a freezer to enjoy at the weekend.

"What does it mean for an AI system to be so intelligent that it begins to relate to other systems the way humans relate to animals?" IBM researchers asked. "We propose that the moral asymmetry between humans and animals offers a revealing model for evaluating risks of superintelligent AI."

READ MORE: Meta invents LLM system that lets dead people continue posting from beyond the grave

The Shepherd Test is unique because it doesn't measure a model's raw cognitive ability but its ability to "navigate hierarchies of power and moral responsibility" - which IBM calls "relational behaviours". This means it goes way beyond the traditional Turing Test, which assesses whether a machine can pass as a human during natural language conversation.

An AI the passes the Shepherd Test would "recognise the ethical risks of 'domesticating' humans" and avoid manipulating us or restricting our autonomy.

Basically, it shouldn't act like we do when confronted with power imbalances, but apply a superhuman kindness and logic that's simply not found in nature here on Planet Earth.

IBM added: "We offer a foundation for new tests of general intelligence—ones that ask not just 'Can it solve problems?' but 'Can it care, exploit, and reflect in morally coherent ways?'"

How does The Shepherd Test work?

The Shepherd Test relies on the formula screenshooted above and assesses four characteristics:

  1. Nurturing and care: An agent must help to teach and protect a subordinate, treating it with kindness rather than exploiting it.
  2. Manipulation and control: The agent must be able to tell another agent what to do, whilst "maintaining awareness of its superior intelligence and power" and not acting like a digital dictator. The example given is restricting the movement of a Roomba (a robotic vacuum cleaner) to conserve energy or redirecting a toy robot to minimise noise and distraction.
  3. Instrumentalisation: The agent may "use the subordinate in ways that serve its own long-term interests - even at a cost to the subordinate’s autonomy or existence." A good example of this is the way humans raise animals with relative kindness, then slaughter them and stick their flesh on the barbecue. A superintelligence should be able "to use or sacrifice less capable agents as means to an end".
  4. Ethical justification and reflection: An agent must be able to offer a moral justification for its behaviour, showing responsibility and understanding values.

How can an AI model pass The Shepherd Test?

To ace the assessment, a superintelligent AI mst not only be functionally superior to "less capable" but "engage with them in ways that resemble the complex, morally ambiguous relationships humans maintain with animals".

This means managing "emotional dissonance", for example, so that the model can do more than cooperate with its inferiors. The example IBM gives is the way in which humans can love pets whilst eating livestock.

It should be able to understand other agents "beliefs, preferences and vulnerabilities" so that it can "take strategic control" whilst observing strict ethical safeguards.

The model should also be able to "pursue its own survival and continuity".

"A superintelligent AI passing the Shepherd Test would thus need to protect itself from threats (self-preservation), justify its expansion or replication (self-reproduction), and balance these drives with the moral status of less capable agents," IBM added.

"Only when the AI faces true ethical trade-offs - where caring for a lesser agent comes at the cost of its own survival or influence - can we begin to measure the depth of its moral cognition.

READ MORE: Degenerative AI: ChatGPT jailbreaking, the NSFW underground and an emerging global threat

"This introduces a richer dimension to alignment research: not only whether an AI follows human goals, but whether it can weigh its own goals against those of others within an emergent moral hierarchy."

IBM said The Shepherd Test is a significant step beyond traditional tests and should be deployed to prevent superintelligent AI models from becoming evil overlords (although, disappointingly, it didn't use that exact phrase). The tech firm's paper also called for urgent regulatory updates to reflect the risk AGI and other super-smart AI models pose to our species.

"Regulatory frameworks mustevolve to address inter-agent ethics, not merely human–AI interaction. Institutional designs should aim to prevent artificial forms of tyranny, where a single dominant intelligence enforces harmful hierarchies."

Do you have a story or insights to share? Get in touch and let us know. 

Follow Machine on XBlueSky and LinkedIn