Performance of ChatGPT and GPT-4 on Polish National Specialty Exam (NSE) in Ophthalmology

Marcin Ciekalski; Maciej Laskowski; Agnieszka Koperczak; Maria Śmierciak; Sebastian Sirek

doi:10.2478/ahem-2024-0006

.blurhash-client-img { display: none !important; }

Performance of ChatGPT and GPT-4 on Polish National Specialty Exam (NSE) in Ophthalmology

Postępy Higieny i Medycyny Doświadczalnej

Volume 78 (2024): Issue 1 (January 2024)

By: Marcin Ciekalski , Maciej Laskowski , Agnieszka Koperczak , Maria Śmierciak and Sebastian Sirek

Open Access

|Sep 2024

Figures & Tables

Overall proportion of correct and incorrect answers (McNemar’s test: chi-squared = 13_4; df=1; p <0_001; OR (95% CI) = 3_78 (1_78–8_96))

LLM	GPT-4
GPT-3.5	Correct answer	Yes	No	Row total
	Yes	28 (28.6%)	9 (9.2%)	37 (37.8%)
	No	34 (34.7%)	27 (27.6%)	61 (62.2%)
	Column total	62 (63.3%)	36 (36.7%)	N = 98

Distribution of correct and incorrect answers for Treatment & Pharmacology question category (McNemar’s test: chi-squared = 1_5; df=1; p=0_2207; OR (95% CI) = 5 (0_559–236_488))

LLM	GPT-4
GPT-3.5	Correct answer	Yes	No	Row total
	Yes	2 (20%)	1 (10%)	3 (30%)
	No	5 (50%)	2 (20%)	7 (70%)
	Column total	7 (70%)	3 (30%)	N = 10

Distribution of correct and incorrect answers for Physiology & Diagnostics question category (McNemar’s test: chi-squared = 2_5; df=1; p=0_1138; OR (95% CI) = 4 (0_798–38_666))

LLM	GPT-4
GPT-3.5	Correct answer	Yes	No	Row total
	Yes	6 (28.6%)	2 (9.5%)	8 (38.1%)
	No	8 (38.1%)	5 (23.8%)	13 (61.9%)
	Column total	14 (66.7%)	7 (33.3%)	N = 21

Distribution of correct/false answers allocated for level of confidence for GPT4

Level of confidence	GPT-4
Level of confidence	Correct	Incorrect
Definitely sure	8	1
Very sure	40	19
Almost sure	14	16
Not very sure	-	-
Definitely not sure	-	-

Distribution of correct and incorrect answers for Pediatrics question category (McNemar’s test: chi-squared = 0_571; df=1; p=0_4497; OR (95% CI) = 2_5 (0_409–26_253))

LLM	GPT-4
GPT-3.5	Correct answer	Yes	No	Row total
	Yes	2 (18.2%)	2 (18.2%)	4 (36.4%)
	No	5 (45.5%)	2 (18.2%)	7 (63.6%)
	Column total	7 (63.6%)	4 (36.4%)	N = 11

Distribution of certainty levels between LLMs

LLM	GPT-4
GPT-3.5	Level of confidence	Definitely sure	Very sure	Almost sure	Not very sure	Definitely not sure	Total
	Definitely sure	2 (2.04 %)	18 (18.37%)	21 (21.43%)	-	-	41 (41.84%)
	Very sure	7 (7.14%)	41 (41.84%)	9 (9.18%)	-	-	57 (58.16%)
	Almost sure	-	-	-	-	-	-
	Not very sure	-	-	-	-	-	-
	Definitely not sure	-	-	-	-	-	-
	Total	9 (9.18%)	59 (60.20%)	30 (30.61%)	-	-	N = 98

Distribution of correct/false answers allocated for level of confidence for GPT3_5

Level of confidence	GPT-3.5
Level of confidence	Correct	Incorrect
Definitely sure	18	23
Very sure	19	38
Almost sure	-	-
Not very sure	-	-
Definitely not sure	-	-

Distribution of correct and incorrect answers for Clinical & Case Questions question category (McNemar’s test: chi-squared = 4_083; df=1; p=0_0433; OR (95% CI)= 5 (1_066–46_933))

LLM	GPT-4
GPT-3.5	Correct answer	Yes	No	Row total
	Yes	17 (39.5%)	2 (4.7%)	19 (44.2%)
	No	10 (23.3%)	14 (32.6%)	24 (55.8%)
	Column total	27 (62.8%)	16 (37.2%)	N = 43

Distribution of correct and incorrect answers for Surgery question category (McNemar’s test: chi-squared = 1_125; df=1; p=0_2888; OR (95% CI) = 3 (0_536–30_393))

LLM	GPT-4
GPT-3.5	Correct answer	Yes	No	Row total
	Yes	1 (7.7%)	2 (15.4%)	3 (23.1%)
	No	6 (46.2%)	4 (30.8%)	10 (76.9%)
	Column total	7 (53.9%)	6 (46.2%)	N = 13

References

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.2478/ahem-2024-0006 | Journal eISSN: 1732-2693

Journal RSS Feed

Language: English

Page range: 111 - 116

Submitted on: Jan 11, 2024

Accepted on: Jun 19, 2024

Published on: Sep 23, 2024

Published by: Hirszfeld Institute of Immunology and Experimental Therapy

In partnership with: Paradigm Publishing Services

Publication frequency: 1 issue per year

Keywords:

ophthalmology,

ChatGPT,

Polish national specialty exam

Related subjects:

Life sciences,

Molecular biology,

Microbiology and virology,

Medicine,

Basic medical science,

Immunology

© 2024 Marcin Ciekalski, Maciej Laskowski, Agnieszka Koperczak, Maria Śmierciak, Sebastian Sirek, published by Hirszfeld Institute of Immunology and Experimental Therapy
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Volume 78 (2024): Issue 1 (January 2024)