Innovation Age Decay Analysis
Measure scientific novelty by tracking the 'age' of ideas in published papers over time.
The Innovation Age Decay Analysis is a data-driven framework for measuring the novelty of scientific research by analyzing the 'age' of ideas presented in published papers. It operates on the principle that scientific progress is reflected in the introduction of new concepts, methods, and terminology into the literature. By computationally analyzing all published papers in a field (e.g., biomedicine) over decades, the framework tracks when specific words and word combinations first appear. It then assesses individual papers by determining the 'age' (time since first introduction) of the newest ideas they contain. This creates a historical map of idea introduction and allows for the measurement of trends in innovation across time, funding sources, and researcher career stages. The core insight is that a field's health can be gauged by whether new papers are built on recent breakthroughs or are recycling older, established concepts.
The framework reveals systemic issues, such as a trend toward older ideas in NIH-funded work over recent decades, correlating with increased conservatism in grant funding. It also connects to career incentives, showing that younger scientists tend to work with newer ideas, but systemic barriers delay their ability to secure major grants, potentially stifling the most innovative period of their careers. This analysis provides an empirical basis for questioning funding models and evaluating the return on investment in scientific research.
- Scientific novelty can be objectively measured by tracking the first appearance of specific ideas (words/concepts) in the literature.
- The 'age' of the newest idea in a paper is a proxy for its innovativeness; newer ideas indicate more groundbreaking work.
- A healthy, progressing field shows a steady stream of papers built on ideas that are only a few years old.
- Funding systems that prioritize low-risk, predictable outcomes will show a measurable increase in the 'age' of ideas in supported work over time.
- Individual researchers are most innovative early in their careers; systemic delays in funding them directly reduce field-wide innovation.
- Build the Historical CorpusCompile the full text of every published paper in the target field (e.g., biomedicine) for as many decades as data allows. This creates the raw dataset for analysis.Pro tipUse existing digital archives and databases (like PubMed) and leverage computational tools for bulk data acquisition and cleaning.WarningEnsure consistent parsing of text across different publication formats and eras to avoid introducing noise into the word analysis.
- Map Idea Introduction Over TimeFor each year, analyze the corpus to extract all unique words and word combinations (n-grams). Identify which of these are new by comparing against all words from previous years. This creates a timeline marking the 'birth year' of every idea.Pro tipFocus on meaningful n-grams (e.g., 'polymerase chain reaction') rather than single common words to capture substantive concepts.WarningBe mindful of synonyms and evolving terminology; 'cancer' and 'carcinoma' might refer to similar concepts introduced at different times.
- Calculate Idea Age for Each PaperFor any given published paper, analyze its text to identify the ideas (words/n-grams) it uses. Determine the 'birth year' for each idea from your historical map. The 'innovation age' of the paper is the difference between its publication year and the birth year of the *newest* idea it contains.Pro tipA paper using 'polymerase chain reaction' (born 1983) in 2020 has an innovation age of 37 years for that idea, indicating it's using a well-established method.WarningA paper can have a low innovation age for trivial reasons (e.g., a new acronym for an old concept). Context and manual spot-checking are valuable.
- Aggregate and Analyze TrendsCalculate average innovation ages for papers grouped by year, funding source (e.g., NIH vs. foundation), researcher career stage, or institution. Plot these trends over time to visualize stagnation or progress.Pro tipCompare the innovation age trend for NIH-funded papers against the overall field trend to isolate the effect of the funding system.WarningCorrelation is not causation. A rising innovation age indicates less novel work but requires further investigation into the underlying drivers (funding incentives, risk aversion, etc.).
- Connect to Systemic VariablesCross-reference innovation age data with other metrics: the age at which researchers win their first major grant (RO1), university indirect cost rates, geographic distribution of funding, and tenure rates. Look for correlations that explain the trends.Pro tipThe framework powerfully reveals that as the age for winning a first RO1 grant has increased from the mid-30s to the mid-40s, the innovation age of published work has also increased.WarningThis is a diagnostic tool, not a prescriptive one. It points to problems but doesn't automatically generate solutions; that requires policy and cultural change.
- Communicate Findings for PolicyTranslate the data into clear narratives: 'The NIH now funds work based on ideas that are 7-8 years old, versus 1-3 years old in the 1980s.' Connect this to outcomes like flatlining life expectancy despite biomedical advances.Pro tipFrame the issue as a misalignment between public investment and public health return, making it a matter of accountability and mission failure.WarningAvoid oversimplifying; acknowledge that some incremental work is necessary, but the balance has shifted too far from high-risk, high-reward exploration.
Bhattacharya and Packalen applied the framework to the entire corpus of NIH-funded biomedical research papers from the 1980s to the 2010s. They tracked the 'innovation age'—the age of the newest idea in each paper—over time.
The framework data was cross-referenced with demographic data on researchers. They analyzed the age at which scientists received their first major RO1 grant and compared it to the innovation age of the work they published.
Developed by Dr. Jay Bhattacharya and his colleague Mikko Packalen at the University of Waterloo, this framework emerged from a decade of work before the COVID-19 pandemic aimed at measuring the innovativeness of scientific portfolios. It was born from a desire to move beyond anecdotal claims about scientific stagnation and create an objective, scalable metric. The methodology leverages the digitization of scientific literature and computational text analysis to map the entire history of a field's published ideas. A key paper applying this framework was published on the eve of the pandemic, asking 'how innovative is the NIH portfolio?' The work was motivated by observable trends in grant conservatism and concerns that the system was increasingly rewarding incremental 'crank-turning' science over bold, high-risk hypothesis testing.