Have a personal or library account? Click to login
Performance of ChatGPT and GPT-4 on Polish National Specialty Exam (NSE) in Ophthalmology Cover

Performance of ChatGPT and GPT-4 on Polish National Specialty Exam (NSE) in Ophthalmology

Open Access
|Sep 2024

Figures & Tables

Overall proportion of correct and incorrect answers (McNemar’s test: chi-squared = 13_4; df=1; p <0_001; OR (95% CI) = 3_78 (1_78–8_96))

LLMGPT-4
GPT-3.5Correct answerYesNoRow total
Yes28 (28.6%)9 (9.2%)37 (37.8%)
No34 (34.7%)27 (27.6%)61 (62.2%)
Column total62 (63.3%)36 (36.7%)N = 98

Distribution of correct and incorrect answers for Treatment & Pharmacology question category (McNemar’s test: chi-squared = 1_5; df=1; p=0_2207; OR (95% CI) = 5 (0_559–236_488))

LLMGPT-4
GPT-3.5Correct answerYesNoRow total
Yes2 (20%)1 (10%)3 (30%)
No5 (50%)2 (20%)7 (70%)
Column total7 (70%)3 (30%)N = 10

Distribution of correct and incorrect answers for Physiology & Diagnostics question category (McNemar’s test: chi-squared = 2_5; df=1; p=0_1138; OR (95% CI) = 4 (0_798–38_666))

LLMGPT-4
GPT-3.5Correct answerYesNoRow total
Yes6 (28.6%)2 (9.5%)8 (38.1%)
No8 (38.1%)5 (23.8%)13 (61.9%)
Column total14 (66.7%)7 (33.3%)N = 21

Distribution of correct/false answers allocated for level of confidence for GPT4

Level of confidenceGPT-4
CorrectIncorrect
Definitely sure81
Very sure4019
Almost sure1416
Not very sure--
Definitely not sure--

Distribution of correct and incorrect answers for Pediatrics question category (McNemar’s test: chi-squared = 0_571; df=1; p=0_4497; OR (95% CI) = 2_5 (0_409–26_253))

LLMGPT-4
GPT-3.5Correct answerYesNoRow total
Yes2 (18.2%)2 (18.2%)4 (36.4%)
No5 (45.5%)2 (18.2%)7 (63.6%)
Column total7 (63.6%)4 (36.4%)N = 11

Distribution of certainty levels between LLMs

LLMGPT-4
GPT-3.5Level of confidenceDefinitely sureVery sureAlmost sureNot very sureDefinitely not sureTotal
Definitely sure2 (2.04 %)18 (18.37%)21 (21.43%)--41 (41.84%)
Very sure7 (7.14%)41 (41.84%)9 (9.18%)--57 (58.16%)
Almost sure------
Not very sure------
Definitely not sure------
Total9 (9.18%)59 (60.20%)30 (30.61%)--N = 98

Distribution of correct/false answers allocated for level of confidence for GPT3_5

Level of confidenceGPT-3.5
CorrectIncorrect
Definitely sure1823
Very sure1938
Almost sure--
Not very sure--
Definitely not sure--

Distribution of correct and incorrect answers for Clinical & Case Questions question category (McNemar’s test: chi-squared = 4_083; df=1; p=0_0433; OR (95% CI)= 5 (1_066–46_933))

LLMGPT-4
GPT-3.5Correct answerYesNoRow total
Yes17 (39.5%)2 (4.7%)19 (44.2%)
No10 (23.3%)14 (32.6%)24 (55.8%)
Column total27 (62.8%)16 (37.2%)N = 43

Distribution of correct and incorrect answers for Surgery question category (McNemar’s test: chi-squared = 1_125; df=1; p=0_2888; OR (95% CI) = 3 (0_536–30_393))

LLMGPT-4
GPT-3.5Correct answerYesNoRow total
Yes1 (7.7%)2 (15.4%)3 (23.1%)
No6 (46.2%)4 (30.8%)10 (76.9%)
Column total7 (53.9%)6 (46.2%)N = 13
Language: English
Page range: 111 - 116
Submitted on: Jan 11, 2024
Accepted on: Jun 19, 2024
Published on: Sep 23, 2024
Published by: Hirszfeld Institute of Immunology and Experimental Therapy
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Marcin Ciekalski, Maciej Laskowski, Agnieszka Koperczak, Maria Śmierciak, Sebastian Sirek, published by Hirszfeld Institute of Immunology and Experimental Therapy
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.