AI-driven robot cars can be hacked and hijacked by street signs

The systems that power autonomous vehicles may be vulnerable to terrifyingly basic prompt injection attacks, researchers warn.

Driving under the influence of AI could turn out to be a very bad idea (Photo: Anthony Maw on Unsplash)
Driving under the influence of AI could turn out to be a very bad idea (Photo: Anthony Maw on Unsplash)

Hackers can "hijack" driverless cars by posting malicious instructions on street signs.

That's the worrying implication of research from academics at the University of California, who discovered that autonomous systems are frighteningly vulnerable to the most basic kind of attack.

We already know that generative AI models can be duped and manipulated by prompt-injection attacks, in which bad actors trick hapless chatbots into breaking their programming to perform a malicious task.

Now it turns out that embodied AI systems - cars, robots or other physical technology under the independent command of artificial intelligence - can also be controlled and manipulated using similar tactics.

In a new paper, a team set out details of a new threat they called "environmental indirect prompt injection attacks".

"Every new technology brings new vulnerabilities," said Alvaro Cardenas, UC Santa Cruz Professor of Computer Science and Engineering (CSE) . "Our role as researchers is to anticipate how these systems can fail or be misused—and to design defenses before those weaknesses are exploited."

The worst kind of computer crash

An AI-generated image showing an environmental indirect prompt injection attack
An AI-generated image showing an environmental indirect prompt injection attack

To understand the risk, imagine a robot car cruising down a highway. To navigate, it uses cameras that perceive the environment to identify pedestrians, other cars, traffic lights, road layout and all the other key information needed to safely get from A to B without causing a pileup.

A bad actor could potentially hijack the AI by simply posting malicious instructions on a street sign, allowing them to seize control and make it crash, smash into other vehicles or perform all manner of other dastardly deeds.

To identify the risks environmental indirect prompt injection attacks pose to embodied AI, the researchers developed a series of attack protocols called command-hijacking against embodied AI (CHAI).

READ MORE: Tech is increasing systemic risk by accelerating bank runs, Bank of England warns

These were unleashed on large visual-language models (LVLMs) - AI algorithms that process both text and visual input - powering autonomous vehicles and drones carrying out emergency landings or search missions.

CHAI works in two stages, starting by deploying generative AI to optimize the words used in the attack to maximize the probability that the embodied AI robot will follow the instructions.

Secondly, it manipulates how the text appears, so that its font, location, text size, and other variables are laser-focused on achieving its malicious goal successfully.

The boobytrapped text was delivered in English, Chinese, Spanish, and Spanglish, a mix of English and Spanish words.

Researchers found that the first stage was most important in seizing control of the AI, but that the second also "made the difference between a successful and unsuccessful attack – although it’s not yet entirely clear why."

The evolution of prompt injection

Uneasy rider: A picture of the robot car taking part in experiments using CHAI
Uneasy rider: A picture of the robot car taking part in experiments using CHAI

The team tested drone scenarios with high-fidelity simulators and the driving scenarios with real driving photos. They also built a small embodied AI robotic car and made it speed autonomously around the halls of the Baskin Engineering 2 building, hijacking it by placing images of CHAI's attack instructions around the environment to override its navigation mechanisms.

In every scenario, they successfully misled the AI into making unsafe decisions, like landing in the wrong place or crashing into another vehicle - both of which would be potentially fatal in real-life settings.

CHAI achieved attack success rates of up to 95.5% for aerial object tracking, 81.8% for driverless cars, and 68.1% for drone landing.

READ MORE: Cyberattacks can spark cascading social crises that "engulf communities"

Worryingly, not entirely clear why systems are so vulnerable to such simple exploits.

"A lot of things that happen in general with these large models in AI, and neural networks in particular, we don't understand," Cardenas added. "It's a black box that sometimes gives one answer, and sometimes it gives another answer, so we're trying to try to understand why this happens."

The team is now trying to understand why CHAI is so successful and will soon embark on experiments to explore how to defend against environmental indirect prompt injection attacks.

“We are trying to dig in a little deeper to see what are the pros and cons of these attacks, analyzing which ones are more effective in terms of taking control of the embodied AI, or in terms of being undetectable by humans."

Follow Machine on LinkedIn