Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Mean Average Deviations for direct predictions and predictions with linear regression for each model and input_ The improve column gives the percentage reduction in MAD compared to the baseline strategy of assigning each article the overall human average, 2_75_
| Model and input | Direct | Regression | ||||
|---|---|---|---|---|---|---|
| MAD | Improve | Intercept | Coefficient | MAD | Improve | |
| GPT-3.5 turbo: Titles | 0.68 | 6% | -1.16 | 1.57 | 0.63 | 13% |
| GPT-3.5 turbo: Abstracts | 0.60 | 17% | -3.46 | 2.26 | 0.51 | 30% |
| GPT-3.5 turbo: Truncated text | 0.70 | 4% | -7.49 | 3.38 | 0.55 | 24% |
| GPT-4o-mini: Abstracts | 0.63 | 13% | -3.32 | 2.07 | 0.59 | 19% |
| GPT-4o-mini: Truncated text | 0.75 | -3% | -2.44 | 1.61 | 0.60 | 17% |
| GPT-4o: Abstracts | 0.62 | 14% | -3.40 | 2.05 | 0.50 | 31% |
| GPT-4o: Truncated text | 0.69 | 5% | -4.44 | 2.28 | 0.50 | 31% |
Spearman correlations between humans scores and model average scores (over 30 iterations) for 51 information science articles_ Values above 0_75 are highlighted_
| Spearman correlation | GPT-3.5 turbo: Abstracts | GPT-3.5 turbo: Truncated text | GPT-4o-mini: Abstracts | GPT-4o-mini: Truncated text | GPT-4o:Abstracts | GPT-4o: Truncated text | Human |
|---|---|---|---|---|---|---|---|
| GPT-3.5 turbo: Titles | 0.439 | 0.444 | 0.359 | 0.499 | 0.539 | 0.589 | 0.434 |
| GPT-3.5 turbo: Abstracts | 1.000 | 0.757 | 0.700 | 0.718 | 0.875 | 0.774 | 0.674 |
| GPT-3.5 turbo: Truncated text | 1.000 | 0.672 | 0.686 | 0.732 | 0.783 | 0.625 | |
| GPT-4o-mini: Abstracts | 1.000 | 0.608 | 0.729 | 0.653 | 0.571 | ||
| GPT-4o-mini: Truncated text | 1.000 | 0.813 | 0.801 | 0.506 | |||
| GPT-4o: Abstracts | 1.000 | 0.858 | 0.678 | ||||
| GPT-4o: Truncated text | 1.000 | 0.675 |
Average humans scores and model average scores_
| Human | GPT-3.5 turbo: Titles | GPT-3.5 turbo: Abstracts | GPT-3.5 turbo: Truncated text | GPT-4o-mini: Abstracts | GPT-4o-mini: Truncated text | GPT-4o: Abstracts | GPT-4o: Truncated text | |
|---|---|---|---|---|---|---|---|---|
| Mean score | 2.75 | 2.49 | 2.75 | 3.03 | 2.93 | 3.22 | 2.99 | 3.16 |