High-Impact Research Candidates

About this tool

What is The Unjournal? We commission and publish independent, public evaluations of research that can inform high-stakes global decisions. We focus on economics, quantitative social science, forecasting, and policy-relevant research—including development economics, global health, animal welfare, AI governance, climate policy, and catastrophic risks. Learn more →

Early prototype (March 2026). Coverage and scoring depth will improve as we expand sources and incorporate human feedback. Scores are AI-generated suggestions to help identify candidates for evaluation.

How it works: Papers are automatically discovered from multiple academic sourcesCurrently scanning: NBER (economics working papers), arXiv (econ, quantitative finance, and cs.CY for AI governance/social impact), CEPR (European economics), EA Forum (effective altruism research links), Semantic Scholar (AI-powered search by cause area), OpenAlex/SSRN (social science preprints), RePEC (economics working papers), Anthropic Economic Research & Societal Impacts team pages, DeepMind and AI governance org papers (GovAI, CSET, GPI). New papers are fetched periodically and scored automatically., then scored by AI models against Unjournal's prioritization criteria. Scores reflect evaluation priorityHow strongly we recommend commissioning an independent Unjournal evaluation of this paper. This considers: (1) Is this research relevant to important global welfare decisions? (2) Would independent evaluation add value beyond existing peer review? (3) Is the paper at a stage where feedback can improve it? (4) Are the authors likely to engage? A high priority score does NOT mean the research is good or bad—it means evaluation would be particularly valuable.—the expected value of commissioning an independent evaluation—not an assessment of research quality. We welcome both team and public feedback.

Core principle

Prioritization = expected value of commissioning an evaluation, not quality endorsement. A prominent but flawed paper may score HIGHER than a rigorous but obscure one, because independent evaluation adds more value there.

Two-track assessment

Criteria weights depend on whether the work is prominent or not:

Criterion	Prominent work	Less-prominent work
Decision relevance	40%	30%
Timing value	25%	15%
Real-world influence	20%	20%
Methodological potential	10%	25%
Prominence	5%	10%

For prominent work (NBER, CEPR, World Bank, top journals), decision-relevance dominates. For less-prominent work, methodology becomes the tie-breaker—our evaluation could boost neglected but rigorous research.

Scoring rubrics (0–10 each)

1. Global Decision-Relevance (most important)

9–10: Directly informs active decisions by major funders or policymakers (GiveWell cost-effectiveness, WHO policy, climate treaty design). Specific organizations can be named.

7–8: Addresses a recognized global priority with clear policy implications, but the link to specific decisions is less direct.

5–6: Relevant to global welfare in a general sense. Interesting for the field but specific decision-relevance is moderate.

3–4: Tangentially related to global priorities. Primarily academic interest.

1–2: No clear connection to decisions affecting global welfare.

Field-specific: Development economics & LMIC health are our strongest areas. AI governance papers must be genuinely quantitative, not conceptual think-pieces. Animal welfare intervention evidence is highly valued.

2. Prominence

9–10: NBER working paper, top-5 journal, Nobel/Clark laureate, >500 citations, major media coverage.

7–8: Well-known department, strong journal, recognized researcher, >100 citations.

5–6: Decent institution, field journal, established researcher.

3–4: Less-known institution, newer researcher, workshop paper.

1–2: Unknown author, self-published, no institutional backing.

Note: ALL NBER papers score ≥8. ALL CEPR/World Bank/IMF papers score ≥7. Prominence is about whether the research community is paying attention—prominent flawed work NEEDS evaluation more than obscure good work.

3. Real-World Influence

9–10: Already cited in policy documents, GiveWell/Open Phil analyses, government reports. Named organizations are using this.

7–8: Likely to influence decisions soon. In an active policy debate. Authors have policy connections.

5–6: Could influence decisions if findings hold up. Relevant to active debates but not yet cited.

3–4: Academic contribution with indirect policy relevance.

1–2: Purely academic exercise with no clear path to influence.

4. Timing Value

9–10: Working paper/preprint released in last 6 months. No peer review yet. Authors actively seeking feedback.

7–8: Working paper 6–18 months old. Under review but not yet published.

5–6: Recently published (1–2 years) in a venue where more review would add value. R&R at journal.

3–4: Published 2+ years ago but still influential. Adds transparency but less urgency.

1–2: Old published work with established peer review. Feedback largely moot.

By methodology: RCTs & field experiments benefit most from early feedback (pre-registration, pre-analysis). Policy reports have narrow windows. Theoretical work is less time-sensitive.

5. Methodological Potential

For prominent work: This is a secondary consideration. If it’s prominent and decision-relevant, score 7+ and move on. Quality assessment is for the evaluation stage.

For less-prominent work (the tie-breaker):

9–10: Innovative methodology, strong identification strategy, reproducible analysis with shared code/data.

7–8: Solid methods appropriate for the research question.

5–6: Acceptable methods, nothing particularly noteworthy.

3–4: Methodological concerns that would make evaluation difficult.

1–2: Not really quantitative. Literature review, opinion piece, or purely conceptual.

Field-appropriate standards (don’t penalize fields where RCTs aren’t possible):
Development/health: RCTs, DiD, regression discontinuity, IV
Environmental/climate: Integrated assessment models, panel data, natural experiments
AI governance: Mixed methods, surveys, formal models
Animal welfare: Stated preference, DCEs, welfare calculations
Political science: Quasi-experimental, panel data, surveys
Macro/trade: DSGE, gravity equations, synthetic control

Score interpretation

Score range	Recommended action	What it means
75–100	Prioritize now	Strong candidate. Matches papers that were actually sent for evaluation by the UJ team.
50–74	Monitor	Borderline. In the range where the human team often disagreed.
25–49	Deprioritize	Below threshold. Matches papers human assessors scored low.
<25	Out of scope	Not quantitative social science, or fundamentally outside UJ coverage.

Calibration

Scores are calibrated against 353 actual human prioritization decisions from the Unjournal team. The AI scores are systematically compared to human assessor ratings, and field-specific corrections are applied. Read more about UJ’s prioritization process.

Four-stage pipeline

Suggesting — A paper is suggested (by AI or human) with a 0–100 rating and discussion of relevance
Assessing — A second team member gives an independent rating (without seeing the first)
Voting — If avg rating ≥ 65%, the field group votes (Strong Yes to Strong No)
Evaluation — An evaluation manager commissions 2+ public evaluations via PubPub

Comment directly on this page using the Hypothes.is sidebar (look for the < tab on the right edge of the page). Highlight any text and add your annotation — visible to all Hypothes.is users. You can also use the feedback buttons on each paper card.

Search:

SortOrder papers by overall AI priority score, recency, decision relevance, or other criteria. "Deeper Model First" shows papers scored by the most capable AI models at the top.

Cause AreaThe primary global issue this research addresses, aligned with Unjournal's focus areas: global health, development, animal welfare, climate, AI governance, catastrophic risks, and more.

DisciplineThe academic field or sub-discipline of the paper, as classified by the AI model. Used to filter by research methodology and domain expertise.

SourceWhere the paper was discovered. NBER = National Bureau of Economic Research. arXiv = open preprint server (econ, quantitative finance). CEPR = Centre for Economic Policy Research. EA Forum = Effective Altruism Forum links. SSRN/OpenAlex = social science preprints. RePEC = economics working papers. anthropic_research = Anthropic Economic Research / Societal Impacts team papers. deepmind_research = DeepMind papers on economic/governance topics. ai_governance_arxiv = arXiv cs.CY (Computers and Society) papers on AI policy and social impact. ai_safety_research = GovAI, CSET, GPI, FHI and related orgs.

PriorityThe AI's recommended action. "High Priority" = strong candidate for Unjournal evaluation. "Monitor" = potentially relevant, worth tracking. "Lower" = less suitable for evaluation at this time.

RecencyFilter papers by how recently they were released or last updated. Useful for focusing on the newest research that may benefit most from timely evaluation.

ModelThe AI model used to score each paper. Papers are scored in tiers: a fast model (gpt-5.4-mini) screens all candidates, a stronger model (gpt-5.4) re-scores the top papers, and the most capable model (gpt-5.4-pro) provides detailed analysis of the highest-ranked papers. "UJ historical" papers were prioritized by the Unjournal team (not AI).

Also show small-model scored (mini/haiku) Also show UJ historical

Shown

High eval. priority