1  Evaluation data: input/features

Reconcile uncertainty ratings and CIs

Where people gave only confidence level ‘dots’, we impute CIs (confidence/credible intervals). We follow the correspondence described here. (Otherwise, where they gave actual CIs, we use these.)1

5 = Extremely confident, i.e., 90% confidence interval spans +/- 4 points or less)

For 0-100 ratings, code the LB as \(max(R - 4,0)\) and the UB as \(min(R + 4,100)\), where R is the stated (middle) rating. This ‘scales’ the CI, as interpreted, to be proportional to the rating, with a maximum ‘interval’ of about 8, with the rating is about 96.

4 = Very*confident: 90% confidence interval +/- 8 points or less

For 0-100 ratings, code the LB as \(max(R - 8,0)\) and the UB as \(min(R + 8,100)\), where R is the stated (middle) rating.

3 = Somewhat** confident: 90% confidence interval +/- 15 points or less

2 = Not very** confident: 90% confidence interval, +/- 25 points or less

Comparable scaling for the 2-3 ratings as for the 4 and 5 rating.

1 = Not** confident: (90% confidence interval +/- more than 25 points)

Code LB as \(max(R - 37.5,0)\) and the UB as \(min(R + 37.5,100)\).

This is just a first-pass. There might be a more information-theoretic way of doing this. On the other hand, we might be switching the evaluations to use a different tool soon, perhaps getting rid of the 1-5 confidence ratings altogether.

We cannot publicly share the ‘papers under consideration’, but we can share some of the statistics on these papers. Let’s generate an ID (or later, salted hash) for each such paper, and keep only the shareable features of interest

  1. Note this is only a first-pass; a more sophisticated approach may be warranted in future.↩︎