The Distillation Problem — It's Not a Cold War, It's Napster

The Thesis

The most valuable intelligence ever created is stored as math — weightless, copyable, extractable through a chat window. The pressure gradient between the cost of creation (billions) and extraction (thousands) is so extreme that distillation is inevitable — regardless of geopolitics.

This is not a China problem. It's an information economics problem. The incentive to distill would exist even if China and the US were close allies.

The Operations

The 3 Exposed Operations

Anthropic traced the operations through payment metadata, public researcher profiles, and request patterns.

MiniMax

Exchanges

13M+

Focus

Agentic coding and tool orchestration

Method

Pivoted within 24 hours of each release to capture the latest capabilities

The resulting model scores well on coding benchmarks — because benchmarks test exactly what the distiller optimized for.

Moonshot

Exchanges

3.4M

Focus

Agentic reasoning, tool use, computer use, computer vision

Method

Surgical approach: extracting and reconstructing Claude's reasoning traces directly

Anthropic attributed the campaign to senior Moonshot staff via request metadata.

DeepSeek

Exchanges

Targeted

Focus

Reasoning traces + political censorship data

Method

Separation: took the reasoning behind a completed response and wrote it out step by step

Used Claude to generate 'safe' alternatives to queries about dissidents and party leaders — training data so DeepSeek models avoid topics sensitive to the Chinese government.

The Gradient

The Pressure Gradient

When one side has capabilities worth trillions and the other can extract them for thousands, information moves. Always.

Cost of Creation

$100M–$1B+

Months of compute

Wide capability manifold

Cost of Extraction

$1K–$100K

Days or weeks

Narrow manifold

This is the same gradient that made water flow downhill since Napster in 1999: the cost of copying is so low that no regulation can fully stop it.

The Analogy

The Napster Analogy

The music industry learned in 1999 and the film industry in 2003: piracy doesn't stop. It slows down.

1999

Music

Napster — copying cost → ~$0

IA

2024–25 — Industrial-scale distillation

2001–03

Music

Napster shut down. Kazaa, LimeWire emerge

IA

2026 — Anthropic exposes labs. New methods will emerge

Today

Music

Piracy slowed but never stopped

IA

Distillation will slow but never stop

Solution

Music

Spotify — new business model

IA

Innovation speed > copying speed (90-day cycle)

If frontier model capabilities are doubling every ~90 days, the question isn't whether they'll copy — it's whether the copy will be relevant by the time it arrives.

The Compression

The Brittleness Problem

Distillation doesn't produce a copy of the original model. It produces a compression. And that compression, like a lossy MP3, has enormous consequences.

Frontier Model

Trained on vast, diverse datasets over months of compute

Reason about code

Navigate ambiguous instructions

Use tools in novel combinations

Maintain coherence over long workflows

Recover from errors

Adapt approach when a plan fails

Manifold

Wide — broad surface of competence across many task types

Distilled Model

Trained on a subset of the frontier model's outputs

What it appears to do well

Reproduce specific behaviors

Score well on target benchmarks

Appear competent on standard tasks

Where it fails

No representational structure to generalize

Can't recover from unexpected failures

Can't use tools in novel combinations

Incoherent reasoning over long workflows

Manifold

Narrow — brilliant at the center, fragile at the edges

The Football Analogy

Watching only the highlights of an NFL game: you see less ads, but you also see much less of the game — only the parts the NFL thinks you'll be interested in.

The Hidden Risk

The Performance Shadow

The gap between frontier and distilled models is widest exactly where AI value is headed — and no benchmark measures it well.

Sustained autonomous work

8+ hours of coherent operation

Routing around obstacles

Navigating problems without human intervention

Unanticipated tool use

Using tools in combinations nobody predicted

Agentic workflows

Multi-step tasks with judgment and adaptation

The Comparison

The Right Framing

While the market debates export controls and geopolitics, the real problem is in the models your company is already using.

Cold War Framing

Incomplete

Core dynamic Geopolitical competition

Solution Export controls

Who does it Nation-states

Why it happens Military advantage

What stops it Regulation

Real risk National security

Napster Framing

Complete

Core dynamic Information economics

Solution Structural moat (innovation speed)

Who does it Everyone with incentive

Why it happens Cost asymmetry (trillions vs thousands)

What stops it Nothing fully — piracy slows, never stops

Real risk Enterprise using degraded models

The Framework

AI Model Decision Framework

If your company is buying or using AI models, these questions determine whether you're at risk.

1

Is it a frontier or distilled model?

Good sign

Frontier → lower brittleness risk

Warning

Distilled/Unknown → high risk

2

What's the intended use?

Good sign

Standard/benchmarkable tasks → OK

Warning

Agentic/autonomous work → test extensively at the edges

3

How was the model evaluated?

Good sign

Tests on novel situations/edge cases → adequate

Warning

Standard benchmarks only → insufficient

4

Vendor transparency?

Good sign

Knows which model and how it was trained → good

Warning

No transparency → red flag

Golden Rule

If the model will do work you don't supervise minute by minute, demand a frontier model.

Stickybit

Where Stickybit Comes In

The distillation problem isn't theoretical. It affects the real quality of the AI systems your company runs today.

Model Auditing

We evaluate the models your company uses — not with off-the-shelf benchmarks, but with real agentic work scenarios that expose brittleness where it actually matters.

Edge Testing

We build evaluation suites that test exactly where distilled models fail: long workflows, novel tool use, error recovery, reasoning under ambiguity.

Frontier Infrastructure

We implement AI pipelines using frontier models with continuous quality monitoring — so your system doesn't degrade silently.

22 years of real production

We know the difference between a model that scores well and one that works at 3 AM when nobody's watching. That difference is where the value lives.

The Distillation Problem It's Not a Cold War. It's Napster.

The 3 Exposed Operations

MiniMax

Moonshot

DeepSeek

The Pressure Gradient

The Napster Analogy

The Brittleness Problem

The Performance Shadow

Sustained autonomous work

Routing around obstacles

Unanticipated tool use

Agentic workflows

The Right Framing

Cold War Framing

Napster Framing

AI Model Decision Framework

Is it a frontier or distilled model?

What's the intended use?

How was the model evaluated?

Vendor transparency?

Where Stickybit Comes In

Model Auditing

Edge Testing

Frontier Infrastructure

22 years of real production

Is your AI model frontier or distilled?