Figure 1.

Figure 2.

The scores given by ChatGPT-4 REF D and me to 51 of my open access articles_
| Score | GPT | % | Me | % |
|---|---|---|---|---|
| 1* | 0 | 0.0% | 2 | 4% |
| 1.5* | 0 | 0.0% | 3 | 6% |
| 2* | 14 | 1.8% | 12 | 24% |
| 2.33* | 1 | 0.1% | 0 | 0% |
| 2.5* | 2 | 0.3% | 9 | 18% |
| 2.67* | 2 | 0.3% | 0 | 0% |
| 2.75* | 0 | 0.0% | 1 | 2% |
| 3* | 509 | 66.5% | 8 | 16% |
| 3.33* | 9 | 1.2% | 0 | 0% |
| 3.5* | 14 | 1.8% | 7 | 14% |
| 3.67* | 15 | 2.0% | 0 | 0% |
| 4* | 199 | 26.0% | 9 | 18% |
| Total | 765 | 100.0% | 51 | 100% |
Pearson correlations for 51 of my open access articles, comparing my initial scores, and scores from ChatGPT-4 REF D_
| Correlation | All articles | Articles scored 2.5+ by me | Articles scored 3+ by me |
|---|---|---|---|
| GPT average vs. author (95% CI) | 0.509 | 0.200 | 0.246 |
| (0.271,0.688) | (-0.148,0.504) | (-0.175,0.590) | |
| GPT vs. author, average of 15 pairs (fraction of 95% Cis excluding 0) | 0.281 | 0.102 | 0.128 |
| (8/15) | (1/15) | (1/15) | |
| GPT vs. GPT (average of 105 pairs) | 0.245 | 0.194 | 0.215 |
| Sample size (articles) | 51 | 34 | 24 |