Reliability Over Generality

Demos work 60% of the time; enterprises need the nines. Prioritize reliability over everything, then climb the abstraction ladder.

Problem it solves

Why most impressive agent demos never become deployable enterprise products.

Best for

Teams turning agent demos into enterprise products where a failed step has real-world cost.

Not ideal for

Consumer or hobbyist demos where 60%-reliable, impressive-looking output is acceptable.

Overview

Why this framework exists

The critical path is to build agents that operate at a higher and higher level of abstraction over time while holding an insanely high reliability standard — that is what turns research into something customers want, and the resulting usage teaches you how to reach the next abstraction faster. Flashy agent demos work ~60% of the time; enterprises cannot use anything below the nines (one use case ends with a physical truck being dispatched). There is a real reliability-vs-generality-vs-cost-vs-speed trade-off; you push the Pareto frontier by framing every use case as "collect more data," not by being prescriptive about the model's end steps.

Core principles

3 total

Reliability before generality: hit the nines first, then raise abstraction.
High reliability at a given abstraction generates the data to unlock the next.
Beat the reliability/generality trade-off by making every use case look like more data, not more hand-coded rules.