"We need agreed guidelines": How to prevent AI tools from causing harm

"Broad take-up of these simple, universal checkpoints will lead to safer AI and the translation of more research ideas into products"

"AI can cause serious harm when implemented without proper evaluation," Tracey Brown warns (Photo by Harrison Broadbent on Unsplash)
"AI can cause serious harm when implemented without proper evaluation," Tracey Brown warns (Photo by Harrison Broadbent on Unsplash)

Imagine spending years developing a cutting-edge, AI-powered software tool for spotting early-stage breast cancer, only to find that doctors can’t use it in hospitals.

Or worse, what if governments use AI tools to select university places or detect fraud but perpetuate discrimination?

Scenarios like these are playing out far too frequently in AI research and deployment because the standards for adoption are not clear to researchers, developers and adopters.

It doesn’t have to be this way.

AI can cause serious harm when implemented without proper evaluation. Knowledge and agreement about how to do that evaluation is lagging a long way behind its spread through processes and services.

Why? In part, it stems from the casual "tech bro" culture of sharing AI kits as beta versions that can be updated after real-world "testing". More prosaically, researchers are often focused on academic goals or simply making the tech work, so questions that are vital to future adoption are not asked early enough.

That can make it difficult or impossible to recover the necessary information later, stopping adoption in its tracks.

Addressing data deficiencies and information gaps

The missing input could be key information about the data the model was trained on, which has a bearing on which situations an AI is equipped to handle. That’s relevant in AI tools used in fields as diverse as agriculture, defence or financial services, but especially so in medicine where lives are at stake and regulation is tight.

For our early-stage breast cancer diagnostics tool, a data deficiency could stem from the fact that certain groups, such as pregnant women, were underrepresented in the training dataset. That would make its role in diagnosis less reliable in a general population and require clear criteria for its use. 

Information gaps might be more trivial, for example, whether the output from the AI tool can feed into the standard software packages used in hospitals. If it does not, it might offer excellent diagnostic support but be next to useless for frontline staff in the clinic.

READ MORE: Non-humans in the loop: AI agents and a shift to autonomous threat response

Plugging these gaps early on also makes AI tools more valuable. For AI-assisted medical devices, anything that will be used on real patients must ultimately clear safety standards set by a regulator, like the Food and Drug Administration in the US. Medics often say that these regulations are "written in blood" because they are a response to past failures that resulted in catastrophic impacts on patients. It must also pass safeguards that protect other data and software systems in use.

A key part of that whole process is assembling documentation that can meet an institution’s Quality Management System, a structured framework that documents the policies, processes and procedures applied throughout an application’s development.

For a researcher with a promising AI-powered medical device and a start-up company, demonstrating that the necessary data has already been assembled will make their device much more attractive to potential buyers, because it is possible to assess whether they can clear the regulatory hurdles and bring it to market.

For all the debate about different countries adopting more or less permissive regulatory systems for AI, any product will bump up against existing regulations and safeguards eventually, particularly in the medical space.

Achieving a responsible AI handover

Whether AI product developers are in countries with a "hard" or "soft" approach, there is a need for agreed guidelines about what information should be captured and shared. Responsible developers and adopters everywhere are asking: "How do we make sure we are doing the right thing to ensure it can be used safely?"

At Sense About Science, the UK-based nonprofit I run, we have developed the Responsible Handover Framework for AI to help streamline AI adoption and promote safety.

Based on engineering project handover principles, its overarching purpose is empowerment. It helps everyone involved — all the way from discovery research and code development to use in the real world — to ask the right questions before adopting a tool, or taking it to the next stage of development. 

The framework is not a new layer of regulation, nor is the aim to introduce an extra bureaucratic barrier to AI adoption. Rather it provides the basis for a structured and pragmatic conversation between developers and adopters to ensure safety-critical information is not lost during development and handovers.

READ MORE: Which jobs are safe from AI? OpenAI boss Sam Altman shares a rare sunbeam of optimism

By providing a better understanding of the strengths and weaknesses of AI tools, it improves safety, helps avoid AI misuse and gives a leg up over regulatory hurdles to adoption. 

After months of testing with organisations as varied as MIT, the Institution of Engineering and Technology, and Guy’s and St Thomas’ Hospital in London, we are now sharing the framework with AI developers, funders and adopters around the world as quickly as we can.

We hope that broad take-up of these simple, universal checkpoints will lead to safer AI and the translation of more brilliant research ideas into products that will benefit all of us.

 Tracey Brown OBE is Director of the charity Sense about Science

Do you have a story or insights to share? Get in touch and let us know. 

Follow Machine on XBlueSky and LinkedIn