Exceptional Research from The Unjournal: Papers Evaluated in 2024-2025

data
analysis
highlights
2024
Author

David Reinstein with Claude Code assistance

Published

December 29, 2025

Introduction

The Unjournal is revolutionizing academic peer review by providing public, rigorous evaluations of research that matters for global priorities—before it’s locked behind journal paywalls or delayed by traditional publishing timelines. Our evaluators don’t just assess research; they provide detailed, quantitative ratings across multiple dimensions, from methodological rigor to real-world relevance.

This post analyzes the 45 papers with full evaluations completed by The Unjournal in 2024-2025, identifying exceptional research based on:

  • High merited journal tier ratings (4+ on 1-5 scale, where 5 = top-tier journal quality)
  • Careful, detailed evaluations reflecting genuine assessment
  • Confidence-adjusted scores (narrow confidence intervals = more certain judgments)
  • “Diamond in the rough” potential (merited journal tier > predicted journal tier)
NoteAbout This Analysis

This analysis was prepared with substantial assistance from Claude Code (Anthropic’s AI assistant). While Claude helped identify patterns, extract data, and draft content, all claims should be verified against the original evaluation summaries linked throughout.

Important caveats to keep in mind:

  • Aggregate scores combine multiple dimensions; individual ratings may tell different stories
  • Papers with single evaluators have higher uncertainty than multi-evaluator papers (marked with * in rankings)
  • High evaluator disagreement (marked with † in rankings) indicates genuine methodological debate, not paper quality issues
  • Author engagement status reflects response to this evaluation process, not general scholarly openness
  • The Unjournal’s value lies not just in ratings, but in the substantive dialogue between evaluators and authors

The 2024-2025 Cohort: Overview

**Papers evaluated:**  45 
**Mean merited journal tier score:**  3.84 /5.0
**Mean overall rating:**  75.5 /100
**Papers scoring 4.5+ on merited journal tier:**  8 
**Papers with diamond scores ≥0.5:**  7 

Quality Distribution

Quality scatter plot showing merited journal tier vs. overall scores for all 45 papers evaluated in 2024-2025. Point size indicates number of evaluators; color indicates aggregate quality score.

The 2024-2025 cohort reflects The Unjournal’s global priorities focus: environmental and health research dominate, with substantial representation across development economics, democratic governance, and philosophy/ethics (see topic chart below).

Exemplary Scholarly Dialogue

The Unjournal’s value lies not just in ratings, but in the substantive exchange between evaluators and authors. The following papers demonstrate this dialogue at its best—cases where authors engaged deeply with evaluator critiques, leading to improved understanding even when final ratings varied widely.

Water Treatment and Child Mortality

Ratings: 50/100 vs. 85/100 (35-point evaluator disagreement) | Response: 10,246 words

This paper exemplifies productive scholarly tension. The first evaluation team (Sharma Waddington & Masset) gave detailed methodological critiques—questioning study inclusion transparency, undisclosed deviations from pre-registration, and policy implications—resulting in a 50/100 rating. The second evaluator found “no major technical issues” and rated it 85/100.

Rather than simply defend their work, the authors updated the working paper in response. The evaluation manager notes:

“The evaluators noted a number of ways that the paper could be improved, it in turn was improved, and while the headline result remains largely the same, we have now learned some new things about the stability of the results… I extend my sincere thanks to the evaluators and the authors for what I view as a fulfilling and transparent cooperative effort to reach the best possible answer to an important question.”

What this shows: High evaluator disagreement doesn’t indicate paper quality problems—it reflects genuine methodological debate. The 35-point split here represents different standards: rigorous meta-analysis methodology (E1) versus practical utility and cost-effectiveness contribution (E2).


The Effect of Public Science on Corporate R&D

Ratings: 64/100 vs. 75/100 | Response: 6,509 words

This evaluation produced what the manager called “a particularly rich public discussion of econometric identification issues.” Evaluator 1 raised fundamental concerns about the identification strategy, citing “bad controls and a shift-share design with endogenous shares.” Evaluator 2 (Bournakis) found the results “convincing” while suggesting robustness checks.

The authors (Arora et al.) responded with a sophisticated 6,500-word defense, directly engaging with advanced econometric theory—explaining why lagged, predetermined shares mitigate endogeneity concerns and why shift-share designs don’t require share exogeneity if shifts are exogenous.

What this shows: Even when evaluators fundamentally disagree about methodology, the resulting public exchange creates lasting scholarly value—a resource for anyone working on similar identification problems.


StrongMinds Cost-Effectiveness

Ratings: 85/100 vs. 87/100 (aligned scores) | Response: 7,485 words

Unlike the papers above, these evaluators gave nearly identical overall scores. Yet their qualitative discussions reveal substantive disagreement masked by aligned numbers. Evaluator 1 praised the work’s rigor while flagging “ad hoc adjustments.” Evaluator 2 went further, calling many decisions “arbitrary and subjective” and recommending a “multiverse approach.”

The Happier Lives Institute authors responded in detail, defending their choices as “principled decisions with empirical bases rather than pure subjective guesses”—while also clarifying why systematic CEAs require judgment calls where academic consensus doesn’t exist.

What this shows: Similar ratings can mask different concerns. Reading only the numbers (85/87) misses the substantive methodological debate about how to handle unavoidable uncertainty in cost-effectiveness analysis.


Resilient Foods for Global Catastrophes

Response: 3,192 words | Evaluation: 3 evaluators

This paper’s authors demonstrated intellectual humility in their response. Rather than defending every position, they:

“Commend the format and reviewer comments which were of extremely high quality… We value good epistemics and understand that it takes many people critically looking at a problem to achieve this, which is what motivated our participation in the Unjournal pilot.”

They acknowledged that their comparison with AGI safety needed clarification, accepted criticism about lack of sensitivity analysis for discount rates, and agreed their cost estimates were “likely low.”

What this shows: Substantive engagement doesn’t require defending every choice. Acknowledging limitations and suggesting future work can be as valuable as robust defense.


Top Papers: Detailed Profiles

Climate & Environmental Research

#1: A Welfare Analysis of Policies Impacting Climate Change

Aggregate Score: 0.84 (Highest in 2024-2025) | Merited journal tier: 4.45/5.0 | Overall: 90.5/100

This paper applies the Marginal Value of Public Funds (MVPF) framework to 96 US environmental policies, providing rigorous welfare rankings beyond simple cost-per-ton metrics.

Evaluator 1 (Johannes Emmerling):

“This is an excellent paper that makes significant contributions to climate policy evaluation. The authors’ application of the MVPF framework to analyze 96 US environmental policies provides valuable insights that traditional cost-per-ton metrics often miss.”

Evaluator 2 (Frank Venmans):

“Under which policy does a dollar of subsidy create the largest impact on welfare? The paper answers these questions by developing the Marginal Value of Public Funds (MVPF). This allows [one] to rank government policies according to their welfare impacts while respecting a given government budget.”

Author engagement: The authors declined to respond to evaluations but noted they may wish to do so at a later date.

DOI: doi.org/10.3386/w32728


#2: Urban Forests: Environmental Health Values and Risks

Aggregate Score: 0.76 | Merited journal tier: 4.35/5.0 | Overall: 75/100

This paper examines urban forests’ environmental health benefits and risks, combining ecological and economic analysis.

Author engagement: The authors granted permission to evaluate this paper and were very responsive and encouraging throughout the process. They opted not to provide a formal response to the evaluations but remain open to future engagement.


#3: The Macroeconomic Impact of Climate Change: Global vs. Local Temperature

Aggregate Score: 0.76 | Merited journal tier: 4.6/5.0 | Overall: 87.5/100

Estimates macroeconomic damages from climate change at 6x larger than previous estimates (12% vs. 2% GDP reduction per degree warming).

Evaluation highlight:

“Overall, the manuscript is well written in the sense that, since the reader understands immediately what is being estimated and why, they could be lulled into the trap of thinking that the research is ‘obvious.’ But it is the careful empirical construction of the argument, and the identification that a literature has been mechanically missing (netting out) the effect of global warming, that makes this research important.”

“The manuscript contains numerous robustness tests, including [some…] involving much work (such as re-estimation with a temperature measure independently created than the primary one), and clear explanations about why each was implemented.”

Author engagement: The authors were offered the opportunity to respond to these evaluations but declined.

DOI: doi.org/10.3386/w32450


Democratic Governance & Political Economy

#4: Misperceptions and Demand for Democracy under Authoritarianism

Aggregate Score: 0.74 | Merited journal tier: 4.2/5.0 | Overall: 86.5/100 | Hours: 16

Field experiment in Turkey showing information interventions increased opposition vote share by 0.8 percentage points (1.5% relative increase).

Evaluation:

“This paper makes a valuable contribution to research on authoritarianism and democratic resilience by combining an online survey and a large-scale field experiment in Turkey… The study is theoretically ambitious, well-powered, and methodologically transparent, with the field experiment demonstrating real-world behavioral effects on opposition vote share.”

“I am inclined to believe this claim as the field experiment is well-powered, pre-registered, and analyzed using appropriate instrumental variable (2SLS) methods, which lends credibility to the causal inference.”

Author engagement: The authors were aware of the evaluation and kept evaluators updated on the most recent versions of their work. They were unable to commit to a formal response timeline.

DOI: 10.3386/w33018


#5: Does the Squeaky Wheel Get More Grease?

Aggregate Score: 0.73 | Merited journal tier: 4.55/5.0 | Overall: 85/100

Published in American Economic Review. Innovative RCT on citizen reporting of pollution violations in China, capturing both direct and spillover effects.

Evaluation:

“This is clearly a very neat experimental design. All prefectures contain some treated firms, but the intensity of treatment (70% or 95% of firms assigned to treatment) varies so that they can assess the ‘general equilibrium’ effect.”

DOI: 10.1257/aer.20221215


Behavioral Interventions & Development

#6: Selecting the Most Effective Nudge

Aggregate Score: 0.73 | Merited journal tier: 4.6/5.0 | Overall: 90/100

Large-scale experiment on immunization nudges, identifying most effective behavioral interventions.

Author engagement: The authors acknowledged the evaluations and declined to respond formally, noting that the paper had been accepted for publication in Econometrica.


#7: The Long-Run Effects of Psychotherapy

Aggregate Score: 0.71 | Merited journal tier: 4.35/5.0 | Overall: 84/100

Tracks long-term impacts of psychotherapy on depression, beliefs, and economic outcomes.


#8: Does Conservation Work in General Equilibrium?

Aggregate Score: 0.70 | Merited journal tier: 4.0/5.0 | Overall: 70/100

Examines whether conservation interventions work when accounting for general equilibrium effects.

Author engagement: The authors granted permission for evaluation under The Unjournal’s interim policy, appreciated the suggestions, and indicated they are working to address them in their next draft. They have expressed willingness to respond formally in the future.


Cash Transfers & Poverty Reduction

#9: How Effective Is (More) Money?

Aggregate Score: 0.69 | Merited journal tier: 4.35/5.0 | Overall: 82/100

Randomizes unconditional cash transfer amounts in the US to understand marginal impacts.


#10: The wellbeing cost-effectiveness of StrongMinds

Aggregate Score: 0.69 | Merited journal tier: 4.2/5.0 | Overall: 86/100

Systematic review and meta-analysis of psychotherapy cost-effectiveness, including charity-specific data from Happier Lives Institute.

Author engagement: The Happier Lives Institute authors provided a detailed and careful written response to the evaluations, engaging substantively with methodological critiques and defending their analytical choices as principled and conservative.


Diamonds in the Rough

Papers where evaluators saw significantly higher potential than predicted journal placement (difference ≥0.5 on the 1-5 scale):

Diamond plot showing merited journal tier vs. predicted journal tier. Red points above the diagonal with ≥0.5 difference are “diamonds in the rough” where evaluators rated merit substantially higher than predicted placement.
Papers with Diamond Scores ≥0.5
Paper Title Merited Journal Tier Predicted Tier Diamond Score
The Returns to Science In the Presence of Technological Risks 4.5 2.0 2.5
The Comparative Impact of Cash Transfers and a Psychotherapy Program on Psychological and Economic Well-being 4.5 3.5 1.0
Artificial Intelligence and Economic Growth 4.5 3.5 1.0
The wellbeing cost-effectiveness of StrongMinds and Friendship Bench: Combining a systematic review and meta-analysis with charity-related data 4.2 3.5 0.7
Towards best practices in AGI safety and governance 4.0 3.5 0.5
Advance Market Commitments: Insights from Theory and Experience 4.5 4.0 0.5
When Celebrities Speak: A Nationwide Twitter Experiment Promoting Vaccination In Indonesia 4.0 3.5 0.5

Notable Diamonds

Based on the data, seven papers have diamond scores ≥0.5:

The Returns to Science In the Presence of Technological Risks (Diamond: +2.5) - Merited journal tier: 4.5, Predicted: 2.0 - Largest gap between merit and predicted tier; evaluators saw exceptional methodological quality despite perceived challenges in traditional publication venues

The Comparative Impact of Cash Transfers and Psychotherapy (Diamond: +1.0) - Merited journal tier: 4.5, Predicted: 3.5 - Head-to-head comparison of interventions valued for rigorous design and policy relevance

Artificial Intelligence and Economic Growth (Diamond: +1.0) - Merited journal tier: 4.5, Predicted: 3.5 - Foundational economic growth theory applied to AI, recognized for long-term significance

The wellbeing cost-effectiveness of StrongMinds (Diamond: +0.7) - Merited journal tier: 4.2, Predicted: 3.5 - Rigorous meta-analysis valued higher than its predicted placement


Additional Exceptional Papers

Science Policy & Innovation

#11: Adaptability and the Pivot Penalty in Science (0.69) - Merited journal tier: 4.5/5.0 - Examines costs and benefits of researchers pivoting between topics

#14: Artificial Intelligence and Economic Growth (0.67) - Merited journal tier: 4.5/5.0 - Economic growth theory applied to AI


Global Development

#12: The Comparative Impact of Cash Transfers and Psychotherapy (0.68) - Merited journal tier: 4.5/5.0 - Head-to-head comparison of interventions

#13: Building Resilient Education Systems (0.68) - Merited journal tier: 4.1/5.0 - Evidence from large-scale randomized trials across five countries

#15: Universal Basic Income (0.67) - Merited journal tier: 4.35/5.0 - Short-term results from long-term experiment in Kenya

#17: Intergenerational Child Mortality Impacts of Deworming (0.66) - Merited journal tier: 4.0/5.0 - 20-year follow-up showing intergenerational effects


Philosophy & Ethics

#18: Population ethical intuitions (0.65) - Merited journal tier: 4.05/5.0 - Survey evidence on population ethics


Understanding Evaluator Agreement and Uncertainty

Not all ratings carry equal confidence. When interpreting the rankings below, consider:

High-Confidence Ratings (Strong Evaluator Agreement)

Papers where evaluators gave similar scores (merit_std ≤ 0.2) represent our most reliable assessments:

  • StrongMinds (merit: 4.2, std: 0.0) — Perfect agreement between evaluators
  • Money Not to Burn (merit: 4.0, std: 0.0) — Perfect agreement
  • Environmental Effects of Economic Production (merit: 4.0, std: 0.0) — Perfect agreement
  • Effective Nudge (merit: 4.6, std: 0.14) — Near-perfect agreement
  • Macro Climate Change (merit: 4.6, std: 0.14) — Near-perfect agreement

Lower-Confidence Ratings (Evaluator Disagreement or Single Evaluator)

Some highly-rated papers have important uncertainty caveats:

**Single-evaluator papers (*)**—Higher ratings but only one perspective:

  • Returns to Science (merit: 4.5, single evaluator) — Highest diamond score (+2.5) but only one evaluation
  • Pivot Penalty (merit: 4.5, single evaluator) — Top-tier rating, one evaluation
  • AGI Safety (merit: 4.0, single evaluator) — One evaluator only

High-disagreement papers (†)—Evaluators saw different strengths:

  • AI and Economic Growth (merit: 4.5, std: 0.71) — Foundational work with evaluator disagreement
  • Cash Transfers vs. Psychotherapy (merit: 4.5, std: 0.71) — High ratings but varied perspectives
  • Water Treatment (merit: 3.1, std: 0.85) — 35-point split (50 vs 85), reflecting methodology vs. utility debate

Confidence-Adjusted Interpretation

When comparing papers, consider:

Effective confidence = raw_merit - uncertainty_penalty

A paper with merit 4.6 and std 0.14 is more reliably excellent than one with merit 4.5 and std 0.71. Single-evaluator papers, while potentially accurate, lack the validation of multiple independent assessments.


Complete Rankings

WarningInterpret with Caution
  • Papers marked with * have only one evaluator (higher uncertainty)
  • Papers marked with have high evaluator disagreement (merit_std > 0.5)
  • Aggregate scores combine multiple dimensions; individual ratings may tell different stories

Patterns Across High-Quality Research

1. Methodological Excellence is Universal

Every top-rated paper demonstrates:

  • Pre-registration or transparent research design
  • Appropriate statistical methods
  • Extensive robustness checks
  • Clear identification strategies

2. Real-World Relevance Distinguishes Impact

Papers scoring highest provide actionable insights:

  • Climate policy rankings (welfare analysis)
  • Election intervention effects (democracy paper)
  • Pollution enforcement strategies (China governance)
  • Cash transfer optimization (multiple papers)

3. Long-Term Data Reveals Hidden Effects

  • Deworming (20-year follow-up): Intergenerational effects
  • Psychotherapy (long-run): Persistent belief changes
  • UBI Kenya: Multi-year household impacts

4. Innovative Designs Get Recognized

Evaluators praised:

  • Macro-climate: Identifying what literature mechanically missed
  • China governance: Spillover effects design
  • Democracy Turkey: Administrative ballot data + IV methods

5. Evaluation Depth Correlates with Quality

Papers receiving most thorough evaluations (12-16 hours) scored highest, suggesting: - Evaluators invest time proportional to quality - Detailed assessment identifies genuine strengths - Thorough feedback potential for improvement

6. Topic Distribution Reflects Global Priorities

The distribution aligns with The Unjournal’s focus on high-impact research for global priorities. Environmental and health economics dominate—reflecting urgent climate and public health challenges—while democracy, development, and existential risk research receive substantial attention.

What The Unjournal Adds

These evaluations demonstrate The Unjournal’s unique value:

1. Public, Detailed Feedback Instead of anonymous accept/reject decisions, authors and readers see exactly what evaluators think, with evidence and reasoning.

2. Quantitative Ratings Multi-dimensional scoring reveals strengths and weaknesses more precisely than traditional reviews.

3. Timely Assessment Many papers evaluated as working papers, providing feedback before traditional journal review.

4. Focus on Impact Criteria like “real-world relevance” and “global priorities relevance” ensure practical importance matters.

5. Transparent Disagreement When evaluators disagree, both perspectives are public, enriching discourse.

Key Statistics

**Papers with merited journal tier ≥ 4.0:** 28 (62%)
**Papers with merited journal tier ≥ 4.5:**  8 
**Total evaluations:**  83 
**Average evaluations per paper:**  1.8 

Conclusion

The 45 papers evaluated by The Unjournal in 2024-2025 showcase research that matters: rigorous methods applied to consequential questions, from climate policy and democratic governance to poverty reduction and long-term development.

These papers earned high ratings through:

  • Research design excellence
  • Data quality and transparency
  • Methodological rigor
  • Real-world relevance
  • Open science practices

As The Unjournal continues expanding coverage, patterns will emerge: Which research areas consistently produce higher-quality work? Does evaluation depth predict research impact? Can we identify “diamonds in the rough” earlier?

For now, these 45 papers represent exceptional research—and they’re all publicly accessible, with detailed evaluations available for anyone to read.

Resources


Analysis conducted using data current as of December 29, 2025. Papers are those with evaluation ratings submitted to the database from January 2024 through December 2025. All evaluation excerpts quoted from public Unjournal evaluations at unjournal.pubpub.org.