Anthropic caught three Chinese labs running 16 million fraudulent conversations to extract Claude's capabilities. The 'Cold War' framing dominated coverage — but the real problem is piracy, and it affects every company buying AI models.
fraudulent conversations
fake accounts
Chinese labs exposed
The most valuable intelligence ever created is stored as math — weightless, copyable, extractable through a chat window. The pressure gradient between the cost of creation (billions) and extraction (thousands) is so extreme that distillation is inevitable — regardless of geopolitics.
This is not a China problem. It's an information economics problem. The incentive to distill would exist even if China and the US were close allies.
Anthropic traced the operations through payment metadata, public researcher profiles, and request patterns.
Exchanges
13M+
Focus
Agentic coding and tool orchestration
Method
Pivoted within 24 hours of each release to capture the latest capabilities
The resulting model scores well on coding benchmarks — because benchmarks test exactly what the distiller optimized for.
Exchanges
3.4M
Focus
Agentic reasoning, tool use, computer use, computer vision
Method
Surgical approach: extracting and reconstructing Claude's reasoning traces directly
Anthropic attributed the campaign to senior Moonshot staff via request metadata.
Exchanges
Targeted
Focus
Reasoning traces + political censorship data
Method
Separation: took the reasoning behind a completed response and wrote it out step by step
Used Claude to generate 'safe' alternatives to queries about dissidents and party leaders — training data so DeepSeek models avoid topics sensitive to the Chinese government.
When one side has capabilities worth trillions and the other can extract them for thousands, information moves. Always.
$100M–$1B+
Months of compute
Wide capability manifold
$1K–$100K
Days or weeks
Narrow manifold
This is the same gradient that made water flow downhill since Napster in 1999: the cost of copying is so low that no regulation can fully stop it.
The music industry learned in 1999 and the film industry in 2003: piracy doesn't stop. It slows down.
Music
Napster — copying cost → ~$0
IA
2024–25 — Industrial-scale distillation
Music
Napster shut down. Kazaa, LimeWire emerge
IA
2026 — Anthropic exposes labs. New methods will emerge
Music
Piracy slowed but never stopped
IA
Distillation will slow but never stop
Music
Spotify — new business model
IA
Innovation speed > copying speed (90-day cycle)
If frontier model capabilities are doubling every ~90 days, the question isn't whether they'll copy — it's whether the copy will be relevant by the time it arrives.
Distillation doesn't produce a copy of the original model. It produces a compression. And that compression, like a lossy MP3, has enormous consequences.
Frontier Model
Trained on vast, diverse datasets over months of compute
Manifold
Wide — broad surface of competence across many task types
Distilled Model
Trained on a subset of the frontier model's outputs
What it appears to do well
Where it fails
Manifold
Narrow — brilliant at the center, fragile at the edges
The Football Analogy
Watching only the highlights of an NFL game: you see less ads, but you also see much less of the game — only the parts the NFL thinks you'll be interested in.
The gap between frontier and distilled models is widest exactly where AI value is headed — and no benchmark measures it well.
8+ hours of coherent operation
Navigating problems without human intervention
Using tools in combinations nobody predicted
Multi-step tasks with judgment and adaptation
While the market debates export controls and geopolitics, the real problem is in the models your company is already using.
If your company is buying or using AI models, these questions determine whether you're at risk.
Good sign
Frontier → lower brittleness risk
Warning
Distilled/Unknown → high risk
Good sign
Standard/benchmarkable tasks → OK
Warning
Agentic/autonomous work → test extensively at the edges
Good sign
Tests on novel situations/edge cases → adequate
Warning
Standard benchmarks only → insufficient
Good sign
Knows which model and how it was trained → good
Warning
No transparency → red flag
Golden Rule
If the model will do work you don't supervise minute by minute, demand a frontier model.
The distillation problem isn't theoretical. It affects the real quality of the AI systems your company runs today.
We evaluate the models your company uses — not with off-the-shelf benchmarks, but with real agentic work scenarios that expose brittleness where it actually matters.
We build evaluation suites that test exactly where distilled models fail: long workflows, novel tool use, error recovery, reasoning under ambiguity.
We implement AI pipelines using frontier models with continuous quality monitoring — so your system doesn't degrade silently.
We know the difference between a model that scores well and one that works at 3 AM when nobody's watching. That difference is where the value lives.
The difference doesn't show up on benchmarks. It shows up when the model needs to think for 8 straight hours with nobody watching. Find out if the AI your company is using can handle it.