Anthropic hit by "industrial-scale distillation attacks" from Chinese AI labs

It's feared that stolen models are being put to work in offensive cyber operations or authoritarian surveillance campaigns.

Jasper Hamill

23 Feb 2026 — 4 min read

Anthropic has claimed that Chinese AI labs are bombarding its models with "distillation attacks" intended to steal the capabilities of Claude.

Last week, Google warned of a rise in the number of private sector threat actors trying to clone rivals' AI models via distillation, which is also called model extraction.

Now Anthropic has identified "industrial-scale" attempts to steal and clone its models. We have decided not to name the labs Anthropic blamed for legal reasons.

"These labs created over 24,000 fraudulent accounts and generated over 16 million exchanges with Claude, extracting its capabilities to train and improve their own models," it wrote on X.

"Distillation can be legitimate: AI labs use it to create smaller, cheaper models for their customers.

"But foreign labs that illicitly distil American models can remove safeguards, feeding model capabilities into their own military, intelligence, and surveillance systems."

The disturbing risks of distilling

It's feared that distilled models lack the guardrails put on commercial models, meaning they could be more useful in task such as building bioweapons - a relatively realistic nightmare scenario because spinning up dangerous bugs is much easier than building nukes or other weapons of mass destruction which require nation-state level expertise,

In a blog post, Anthropic warned: "These campaigns are growing in intensity and sophistication. The window to act is narrow, and the threat extends beyond any single company or region. Addressing it will require rapid, coordinated action among industry players, policymakers, and the global AI community."

Anthropic claimed distillation attacks on American AI models could mean that "unprotected capabilities" are fed deep into the military industrial complexes of rival nations then put to work on offensive cyber operations, disinformation campaigns and mass surveillance.

"If distilled models are open-sourced, this risk multiplies as these capabilities spread freely beyond any single government's control," it wrote.

Anthropic called for export controls to help protect America's lead in AI, warning that labs controlled by rival entities such as the Chinese Communist Party are actively seeking to steal US models.

How are AI labs targeting Anthropic?

One operation, responsible for more than 150,000 exchanges, focused on extracting advanced reasoning across a wide range of tasks.

It used the model as a grading engine to support reinforcement learning and generated "censorship-safe" rewrites of politically sensitive prompts. Activity was tightly coordinated across linked accounts, with shared payment methods and synchronized timing to increase throughput and avoid detection.

A notable tactic involved repeatedly prompting the model to articulate its internal reasoning step by step, effectively mass-producing chain-of-thought training data. Metadata tied the activity to specific researchers.

Claude was used to generate answers to politically sensitive queries like questions about dissidents, party leaders, or authoritarianism - suggesting it was being trained to avoid difficult questions.

A second operation, spanning more than 3.4 million exchanges, concentrated on agentic reasoning, tool use, coding, data analysis, computer-use agents and computer vision.

It relied on hundreds of fraudulent accounts across multiple access pathways, deliberately varying account types to make the campaign harder to detect as a coordinated effort.

In later phases, it shifted to more targeted attempts to extract and reconstruct detailed reasoning traces. Attribution was made through request metadata linked to senior personnel.

Anthropic hit by "industrial-scale distillation attacks" from Chinese AI labs

Jasper Hamill

The disturbing risks of distilling

How are AI labs targeting Anthropic?

READ MORE: Anthropic shares the criminal confessions of Claude, warns of growing "vibe hacking" threat

READ MORE: Welcome to the x-risk games: Which AI models pose the greatest threat to humanity?

Follow Machine on LinkedIn

Read more

FBI warns of surge in ATM jackpotting "cash on command" attacks

Dark pool trading is casting a shadow over market stability, researchers warn

People are using LLMs to generate "dangerously insecure" passwords

Welcome to the x-risk games: Which AI models pose the greatest threat to humanity?

The disturbing risks of distilling

How are AI labs targeting Anthropic?

READ MORE: Anthropic shares the criminal confessions of Claude, warns of growing "vibe hacking" threat

Sign up for the Machine.news weekly newsletter (coming soon)

READ MORE: Welcome to the x-risk games: Which AI models pose the greatest threat to humanity?

Follow Machine on LinkedIn

Read more

FBI warns of surge in ATM jackpotting "cash on command" attacks

Dark pool trading is casting a shadow over market stability, researchers warn

People are using LLMs to generate "dangerously insecure" passwords

Welcome to the x-risk games: Which AI models pose the greatest threat to humanity?