Note 0427 March 20265 min

Against the demo. For the shipping experiment.

A demo is a story told once, in good lighting, to people who want to be persuaded. A shipping experiment is a story told to itself, in the dark, every day, by an interface that doesn't know who's watching. The two test different things. Most of what gets called "AI strategy" is the first kind being mistaken for the second.

In a demo, the prompt is curated, the input is rehearsed, and failure is invisible. Six edges of failure can be hidden behind one well-phrased sentence. In a shipping experiment, the prompt is accidental, the input is whatever a tired user actually types, and failure compounds — yesterday's bad answer is in today's training distribution if the system has any feedback loop at all.

We've stopped doing demos. Or rather, we've stopped doing demos as the test. A demo is fine as a way of explaining a built thing to someone who can't see the screen yet. It is a terrible way of deciding whether the thing should exist. The decision should be made against deployment data, not against your fifth-best phrasing of what it does. If the experiment has not run for a month against actual use, you do not know what it does.