Overview
We’ve collected feedback from evaluators through survey questions in our evaluation forms. With over 100 completed evaluations from both academic and applied streams, we can now share insights about time commitments, evaluator experience, and process feedback.
Time Commitment
Evaluators spend an average of 9.6 hours (median: 8 hours) on an evaluation, ranging from 3 to 32 hours. This reflects differences in paper complexity and evaluator approaches.
Distribution of evaluation time
Evaluator Experience
Our evaluators are highly experienced. From 78 respondents who reported years in field, the median is 10 years (mean: 9.3 years), with 44 (56%) reporting 10+ years.
For review experience, 70 respondents provided counts, with a median of 22.5 papers reviewed (mean: 41.8). 11 evaluators (16%) have reviewed 100+ papers.
Evaluator experience distribution
Papers reviewed (log scale)
| Years.in.Field | Papers.Reviewed |
|---|---|
| Eighteen years working on digital advertising, thirty years evangelizing for the use of field experiments in economics. | Probably around 200 over 30 years |
| More than 10 years | Close to 100 |
| 15 | 25+ |
| 10 years | >100 |
| 10 years | 100+ |
Process Feedback
38 evaluators provided feedback on our template and process.
Highlights of positive feedback:
- “two enthusiastic thumbs up” • “9.5/10”
- “Process was clear, template was helpful”
- “Very thorough evaluation… enjoyed providing point estimates alongside text-based review”
Key areas for improvement:
- Questionnaire length: “Too many questions,” “annoying after spending time on referee report”
- Usability: “Sliders are confusing/wonky,” “metrics with intervals is confusing”
- Clarity: “Would be good to have worked examples,” “process not intuitive”
Willingness to Re-evaluate
Out of 41 responses, 27 included “yes” (~66%) when asked about evaluating revised versions. This high engagement suggests evaluators find the process worthwhile despite friction points.
Fields Represented
20 evaluators provided their field of expertise, representing diverse backgrounds including development economics, experimental economics, environmental economics, political science, AI safety, meta-analysis, and more.
| Field.Expertise |
|---|
| Field experiments on the effects of advertising campaigns |
| I conduct impact evaluations related to the effectiveness of social safety net programs and have studied cash vs food preferences in similar contexts to those studied in this paper. |
| small scale irrigation (social science) |
| moral psychology, behavioural economics |
| Development Studies/ Agricultural Economist |
| Meta-analysis |
| Experimental economics |
| Political Science; Authoritarian Politics; MENA Politics; Experimental Methods |
| Development economics, with some work on irrigation. |
| Political economy, media and politics |
| experimental economics |
| Experimental and behavioral economics |
| Applied econometrics |
| welfare economics and normative ethics |
| Science-of-science |
| I work on integrated assessments and climate policy and cost-benefit analysis more in general. |
| Development economics |
| AI Safety |
| Land use and conservation |
| Environmental Economics |
What We’re Learning
- Time commitment is substantial but manageable: Median of 8 hours for high-quality peer review
- Evaluators are highly qualified: 10+ years experience, 100+ prior reviews
- Process needs refinement: Appreciate thoroughness but find sliders/questions cumbersome
- Strong engagement: High re-evaluation willingness indicates value despite friction
Conclusion
Analysis of 111 evaluations reveals The Unjournal successfully engages experienced researchers in substantive reviews averaging ~10 hours. While room for improvement exists—particularly around usability—evaluators remain willing to participate, suggesting we provide value justifying the time investment.
We’re continuing to improve the evaluation interface based on this feedback, analyzing our quantitative ratings to identify which metrics are most useful or redundant (see our other data analysis posts). We’re also seeking funding and partnerships to transition to a new evaluation platform, given PubPub’s sun-setting. Looking ahead, we aim to provide tools helping evaluators better calibrate their ratings and predictions.
Analysis code and data (excluding confidential information) available in our GitHub repository.
Privacy: This dataset excludes confidential comments, COI information, and evaluator pseudonyms.