INNOVATION

Months to result75% confidence

Agent Runtime Doubling Law

AI agent coherence time doubles every 7 months — model the exponential, not the snapshot

AI agents exponential growth timeline forecasting inference demand

Problem it solves

Quantifying AI agent capability growth over time

Best for

Estimating when agentic AI transitions from task-tool to sustained-labor replacement — useful for investors, operators, and strategists planning AI adoption curves

Not ideal for

Short-term product roadmapping or quarterly planning; the doubling rate is a recent academic estimate and may not hold uniformly across model architectures

Overview

Why this framework exists

The Agent Runtime Doubling Law, cited by Amjad Masad (Replit CEO) from a recent academic paper, holds that the maximum coherent runtime of AI agents — the duration they can pursue a goal autonomously before losing coherence or hitting an unrecoverable error — doubles approximately every 7 months. At the time of recording, the baseline was roughly 30 minutes of sustained autonomous operation.

Extending the doubling curve: 7 months yields ~1 hour, 14 months ~2 hours, 21 months ~4 hours, 28 months ~8 hours (a full working shift), and approximately 3 years takes agents into multi-day autonomous operation. Masad noted OpenAI's o3 model appeared to double coherence over long-horizon tasks in just 3–4 months, suggesting the 7-month figure may be conservative.

The inflection point the framework identifies is when agents cross from 'useful tool' to 'sustained labor.' Below a full working shift, agents augment human workers. Once agents reliably operate for 8+ hours without interruption, the economic calculus for labor substitution changes structurally — not incrementally. The framework is most powerful as a forcing function: instead of asking 'can AI do X today?' it asks 'when does the runtime curve make X inevitable, and how far are we from that date?'

Core principles

5 total

Measure agent capability by runtime duration, not by benchmark scores — duration is what determines labor substitutability
Exponential curves matter most at inflection points: the shift from hours to a full working shift is categorically different from earlier increments
Inference token consumption scales superlinearly with runtime — longer autonomous runs are not linear extensions of shorter ones
Faster-than-expected empirical data (o3 at 3–4 months vs. 7-month baseline) should widen confidence intervals upward, not be discounted
The transition from augmentation to substitution is a threshold event, not a gradient — plan for discontinuity, not a smooth ramp

Steps

4 steps

Establish the current runtime baseline
Identify the current maximum coherent autonomous runtime for agents relevant to your domain — general-purpose LLM agents, code agents, customer service agents. At time of this episode the baseline was ~30 minutes. Check recent papers or model release notes for updated figures.
Pro tipTrack domain-specific runtime, not just general benchmarks — coding agents and customer-service agents may have different coherence floors.
Project the doubling curve to your planning horizon
Using the 7-month doubling period as a conservative estimate, calculate when agents in your domain will reach the runtime thresholds that matter — 1 hour, 8 hours (a shift), 24 hours (a full working day). Map these dates against your operational or investment timeline.
WarningUse the 7-month figure as a ceiling, not a floor — empirical data from o3 suggests the curve may be compressing. Build sensitivity cases at 4 and 7 months.
Identify the labor-substitution threshold for your use case
Determine which runtime milestone crosses the economic threshold for replacing a human role in your context. For routine text-in/text-out roles this may be 2–4 hours; for roles requiring sustained project work it may be 8+ hours. This threshold — not AI capability in the abstract — is your planning trigger.
Pro tipAsk: what runtime duration makes the human supervisor role economically marginal? That is the threshold to track.
Recalibrate quarterly against observed model releases
The 7-month figure is a snapshot from one paper. At each major model release, test or source data on whether coherence duration has shifted. If empirical data consistently beats the curve, compress your timeline; if it lags, extend it. The framework is only valuable if it is updated with actual observations.
WarningDo not lock in the 7-month figure as a law — treat it as a prior to be updated with each new evidence point.

Checklist

Saved in your browser

Identify the current coherent runtime baseline for agents in your target domain (not just general benchmarks)
Plot the 7-month doubling curve against your 1-year and 3-year planning horizons
Define the specific runtime threshold at which labor substitution becomes economically rational for your use case
Build sensitivity cases using 4-month and 7-month doubling periods
Track major model releases and update your runtime estimate quarterly with empirical observations
Account for superlinear inference cost scaling when modeling the economics of longer autonomous runs
Distinguish between runtime duration and runtime reliability — both must be tracked

Examples

2 cases

Replit's customer support automation

Amjad Masad cited Replit's own deployment as evidence: the company replaced 70% of its customer support function with AI agents. At current agent runtimes, these agents handle discrete support tickets — self-contained interactions that fit within the coherence window. As runtime extends, the same agents will handle multi-session account investigations and proactive outreach without human escalation.

Outcome70% reduction in customer support headcount requirement; direct evidence that current runtimes already cross the threshold for ticket-scoped autonomous support work.

Clerc CEO's 700-FTE displacement

Masad cited the Clerc CEO's public blog post reporting 2.3 million AI chat interactions per month, equating to the work of 700 full-time employees they no longer needed to hire. This is a real-world data point anchoring where current agent runtimes sit on the labor-substitution curve — short-form customer interaction is already past threshold.

Outcome700 FTEs worth of labor absorbed by agents operating within current coherence windows — a deployed, not projected, validation of the framework's near-term range.

Common mistakes

4 traps

Treating the baseline snapshot as the planning horizon

Planners who see '30 minutes of coherent runtime today' and conclude 'agents can't replace sustained labor' have made a category error. The framework is about the trajectory, not the current state. The relevant question is when the curve, not where it currently sits.

Assuming linear rather than superlinear inference cost scaling

A 30-minute task and an 8-hour task are not 16x apart in inference cost — longer tasks tend to involve more context management, re-planning, and error recovery, making token consumption grow faster than the runtime ratio suggests. Underestimating this compresses the economics of inference demand.

Ignoring domain-specific variance in coherence rates

The 7-month doubling is an average across architectures and tasks. Coding agents, legal document agents, and customer-service agents have different coherence floors and different improvement trajectories. Applying a single curve to all domains produces planning errors.

Conflating runtime with reliability

An agent running for 8 hours is not the same as an agent running reliably for 8 hours. Error rates within the runtime window matter as much as the duration ceiling. Planning for substitution requires both metrics, not just the headline duration figure.

Origin story

How this framework came to be

Masad surfaced this framework during the Diary Of A CEO debate in 2025, attributing it to a recent academic paper on AI agent coherence. He used it to rebut the common technologist framing that AI capability is too vague to forecast, arguing that runtime duration — a measurable, testable metric — provides a concrete exponential to track rather than relying on impressionistic capability claims.

The corroborating data point he cited was OpenAI's o3 model, which he said had doubled its long-horizon task coherence in 3–4 months — faster than the 7-month baseline — lending empirical support to the curve and suggesting the lower bound may be tightening. The framework quickly became the most cited analytical anchor in the debate, with even the skeptical panelist Bret Weinstein not disputing the directionality of the trend.

Source

Traced to primary

Source · PODCAST

AI AGENTS DEBATE: These Jobs Won't Exist In 24 Months!

Amjad Masad & Bret Weinstein · 2025

Open source →

Related frameworks

Browse all Innovation →