Agent Runtime Doubling Law
AI agent coherence time doubles every 7 months — model the exponential, not the snapshot
The Agent Runtime Doubling Law, cited by Amjad Masad (Replit CEO) from a recent academic paper, holds that the maximum coherent runtime of AI agents — the duration they can pursue a goal autonomously before losing coherence or hitting an unrecoverable error — doubles approximately every 7 months. At the time of recording, the baseline was roughly 30 minutes of sustained autonomous operation.
Extending the doubling curve: 7 months yields ~1 hour, 14 months ~2 hours, 21 months ~4 hours, 28 months ~8 hours (a full working shift), and approximately 3 years takes agents into multi-day autonomous operation. Masad noted OpenAI's o3 model appeared to double coherence over long-horizon tasks in just 3–4 months, suggesting the 7-month figure may be conservative.
The inflection point the framework identifies is when agents cross from 'useful tool' to 'sustained labor.' Below a full working shift, agents augment human workers. Once agents reliably operate for 8+ hours without interruption, the economic calculus for labor substitution changes structurally — not incrementally. The framework is most powerful as a forcing function: instead of asking 'can AI do X today?' it asks 'when does the runtime curve make X inevitable, and how far are we from that date?'
- Measure agent capability by runtime duration, not by benchmark scores — duration is what determines labor substitutability
- Exponential curves matter most at inflection points: the shift from hours to a full working shift is categorically different from earlier increments
- Inference token consumption scales superlinearly with runtime — longer autonomous runs are not linear extensions of shorter ones
- Faster-than-expected empirical data (o3 at 3–4 months vs. 7-month baseline) should widen confidence intervals upward, not be discounted
- The transition from augmentation to substitution is a threshold event, not a gradient — plan for discontinuity, not a smooth ramp
- Establish the current runtime baselineIdentify the current maximum coherent autonomous runtime for agents relevant to your domain — general-purpose LLM agents, code agents, customer service agents. At time of this episode the baseline was ~30 minutes. Check recent papers or model release notes for updated figures.Pro tipTrack domain-specific runtime, not just general benchmarks — coding agents and customer-service agents may have different coherence floors.
- Project the doubling curve to your planning horizonUsing the 7-month doubling period as a conservative estimate, calculate when agents in your domain will reach the runtime thresholds that matter — 1 hour, 8 hours (a shift), 24 hours (a full working day). Map these dates against your operational or investment timeline.WarningUse the 7-month figure as a ceiling, not a floor — empirical data from o3 suggests the curve may be compressing. Build sensitivity cases at 4 and 7 months.
- Identify the labor-substitution threshold for your use caseDetermine which runtime milestone crosses the economic threshold for replacing a human role in your context. For routine text-in/text-out roles this may be 2–4 hours; for roles requiring sustained project work it may be 8+ hours. This threshold — not AI capability in the abstract — is your planning trigger.Pro tipAsk: what runtime duration makes the human supervisor role economically marginal? That is the threshold to track.
- Recalibrate quarterly against observed model releasesThe 7-month figure is a snapshot from one paper. At each major model release, test or source data on whether coherence duration has shifted. If empirical data consistently beats the curve, compress your timeline; if it lags, extend it. The framework is only valuable if it is updated with actual observations.WarningDo not lock in the 7-month figure as a law — treat it as a prior to be updated with each new evidence point.
Amjad Masad cited Replit's own deployment as evidence: the company replaced 70% of its customer support function with AI agents. At current agent runtimes, these agents handle discrete support tickets — self-contained interactions that fit within the coherence window. As runtime extends, the same agents will handle multi-session account investigations and proactive outreach without human escalation.
Masad cited the Clerc CEO's public blog post reporting 2.3 million AI chat interactions per month, equating to the work of 700 full-time employees they no longer needed to hire. This is a real-world data point anchoring where current agent runtimes sit on the labor-substitution curve — short-form customer interaction is already past threshold.
Masad surfaced this framework during the Diary Of A CEO debate in 2025, attributing it to a recent academic paper on AI agent coherence. He used it to rebut the common technologist framing that AI capability is too vague to forecast, arguing that runtime duration — a measurable, testable metric — provides a concrete exponential to track rather than relying on impressionistic capability claims.
The corroborating data point he cited was OpenAI's o3 model, which he said had doubled its long-horizon task coherence in 3–4 months — faster than the 7-month baseline — lending empirical support to the curve and suggesting the lower bound may be tightening. The framework quickly became the most cited analytical anchor in the debate, with even the skeptical panelist Bret Weinstein not disputing the directionality of the trend.