References
- McCoubrie P. Improving the fairness of multiple-choice questions: A literature review. Med Teach. 2004;26(8):709–12. DOI: 10.1080/01421590400013495
- Mirbahai L, Adie JW. Applying the utility index to review single best answer questions in medical education assessment. Archives of Epidemiology and Public Health. 2020;2(1). DOI: 10.15761/AEPH.1000113
- Karthikeyan S, O’Connor E, Hu W. Barriers and facilitators to writing quality items for medical school assessments – a scoping review. BMC Med Educ. 2019;19(1):
123 . DOI: 10.1186/s12909-019-1544-8 - Artsi Y, Sorin V, Konen E, Glicksberg BS, Nadkarni G, Klang E. Large language models for generating medical examinations: Systematic review. BMC Med Educ. 2024;24(1):
354 . DOI: 10.1186/s12909-024-05239-y - Ahmed A, Kerr E, O’Malley A. Quality assurance and validity of ai-generated single best answer questions. BMC Med Educ. 2025;25(1):
300 . DOI: 10.1186/s12909-025-06881-w - Kaya M, Sonmez E, Halici A, Yildirim H, Coskun A. Comparison of ai-generated and clinician-designed multiple-choice questions in emergency medicine exam: A psychometric analysis. BMC Med Educ. 2025;25(1):
949 . DOI: 10.1186/s12909-025-07528-6 - Wu H, Zerner T, Lee D, Court-Kowalski S, Devitt P, Palmer E. Gpt-4 versus human authors in clinically complex mcq creation: A blinded analysis of item quality. Med Teach. 2025;1–14. DOI: 10.21203/rs.3.rs-4831476/v1
- Van Der Vleuten CPM. The assessment of professional competence: Developments, research and practical implications. Adv Health Sci Educ. 1996;1(1):41–67. DOI: 10.1007/BF00596229
- Tavakol M, Dennick R. Making sense of cronbach’s alpha. Int J Med Educ. 2011;2:53–5. DOI: 10.5116/ijme.4dfb.8dfd
- Heeneman S, de Jong LH, Dawson LJ, Wilkinson TJ, Ryan A, Tait GR, et al. Ottawa 2020 consensus statement for programmatic assessment – 1. Agreement on the principles. Med Teach. 2021;43(10):1139–48. DOI: 10.1080/0142159X.2021.1957088
- Pham H, Besanko J, Devitt P. Examining the impact of specific types of item-writing flaws on student performance and psychometric properties of the multiple choice question. MedEdPublish (2016). 2018;7:
225 . DOI: 10.15694/mep.2018.0000225.1 - Lee HY, Yune SJ, Lee SY, Im S, Kam BS. The impact of repeated item development training on the prediction of medical faculty members’ item difficulty index. BMC Med Educ. 2024;24(1):
599 . DOI: 10.1186/s12909-024-05577-x - Webb EM, Phuong JS, Naeger DM. Does educator training or experience affect the quality of multiple-choice questions? Acad Radiol. 2015;22(10):1317–22. DOI: 10.1016/j.acra.2015.06.012
- Taheri R, Nazemi N, Pennington SE, Clark JA, Dadgostari F. Factors influencing educators’ ai adoption: A grounded meta-analysis review. Computers and Education: Artificial Intelligence. 2025;9:
100464 . DOI: 10.1016/j.caeai.2025.100464 - Komasawa N, Yokohira M. Generative artificial intelligence (ai) in medical education: A narrative review of the challenges and possibilities for future professionalism. Cureus. 2025;17(6):
e86316 . DOI: 10.7759/cureus.86316 - Khakpaki A. Advancements in artificial intelligence transforming medical education: A comprehensive overview. Med Educ Online. 2025;30(1):
2542807 . DOI: 10.1080/10872981.2025.2542807 - Preiksaitis C, Rose C. Opportunities, challenges, and future directions of generative artificial intelligence in medical education: Scoping review. JMIR Med Educ. 2023;9:
e48785 . DOI: 10.2196/48785 - Downing SM. The effects of violating standard item writing principles on tests and students: The consequences of using flawed test items on achievement examinations in medical education. Adv Health Sci Educ Theory Pract. 2005;10(2):133–43. DOI: 10.1007/s10459-004-4019-5
- Russell RG, Lovett Novak L, Patel M, Garvey KV, Craig KJT, Jackson GP, et al. Competencies for the use of artificial intelligence-based tools by health care professionals. Acad Med. 2023;98(3):348–56. DOI: 10.1097/ACM.0000000000004963
- Storey VC, Yue WT, Zhao JL, Lukyanenko R. Generative artificial intelligence: Evolving technology, growing societal impact, and opportunities for information systems research. Inf Syst Front. 2025;27(5):2081–102. DOI: 10.1007/s10796-025-10581-7
- Ng IKS, Goh WGW, Teo DB, Chong KM, Tan LF, Teoh CM. Clinical reasoning in real-world practice: A primer for medical trainees and practitioners. Postgrad Med J. 2024;101(1191):68–75. DOI: 10.1093/postmj/qgae079
- Gruppen LD. Clinical reasoning: Defining it, teaching it, assessing it, studying it. West J Emerg Med. 2017;18(1):4–7. DOI: 10.5811/westjem.2016.11.33191
- Ngo A, Gupta S, Perrine O, Reddy R, Ershadi S, Remick D. Chatgpt 3.5 fails to write appropriate multiple choice practice exam questions. Acad Pathol. 2024;11(1):
100099 . DOI: 10.1016/j.acpath.2023.100099 - Messick S. Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. Am Psychol. 1995;50(9):
741 . DOI: 10.1037/0003-066X.50.9.741 - Boscardin CK, Gin B, Golde PB, Hauer KE. Chatgpt and generative artificial intelligence for medical education: Potential impact and opportunity. Acad Med. 2024;99(1):22–7. DOI: 10.1097/ACM.0000000000005439
- Kıyak YS, Coşkun Ö, Budakoğlu I, Uluoğlu C. Chatgpt for generating multiple-choice questions: Evidence on the use of artificial intelligence in automatic item generation for a rational pharmacotherapy exam. Eur J Clin Pharmacol. 2024;80(5):729–35. DOI: 10.1007/s00228-024-03649-x
- Cross JL, Choma MA, Onofrey JA. Bias in medical ai: Implications for clinical decision-making. PLOS Digit Health. 2024;3(11):
e0000651 . DOI: 10.1371/journal.pdig.0000651 - Masters K, MacNeil H, Benjamin J, Carver T, Nemethy K, Valanci-Aroesty S, et al. Artificial intelligence in health professions education assessment: Amee guide no. 178. Med Teach. 2025;47(9):1410–24. DOI: 10.1080/0142159X.2024.2445037
- Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, et al. Chain-of-thought prompting elicits reasoning in large language models. Adv Neural Inf Process Syst. 2022;35:24824–37.
- Cho Y, Park GL, Waite GN, Mudigonda A, Szarek JL. Development of a universal prompt as a scalable generative ai-assisted tool for usmle step 1 style multiple-choice question refinement in medical education. Med Sci Educ. 2025;35(2):611–3. DOI: 10.1007/s40670-025-02334-7
- Norcini JJ, McKinley DW. Assessment methods in medical education. Teach Teach Educ. 2007;23(3):239–50. DOI: 10.1016/j.tate.2006.12.021
- Barker AP. Artificial intelligence in health education within higher education institutions. Evid Based Nurs. 2025;28(3):
147 . DOI: 10.1136/ebnurs-2025-104314 - Greenhalgh T, Robert G, Macfarlane F, Bate P, Kyriakidou O. Diffusion of innovations in service organizations: Systematic review and recommendations. Milbank Q. 2004;82(4):581–629. DOI: 10.1111/j.0887-378X.2004.00325.x
- Moldt JA, Festl-Wietek T, Fuhl W, Zabel S, Claassen M, Wagner S, et al. Assessing ai awareness and identifying essential competencies: Insights from key stakeholders in integrating ai into medical education. JMIR Med Educ. 2024;10:
e58355 . DOI: 10.2196/58355 - Capan Melser M, Steiner-Hofbauer V, Lilaj B, Agis H, Knaus A, Holzinger A. Knowledge, application and how about competence? Qualitative assessment of multiple-choice questions for dental students. Med Educ Online. 2020;25(1):
1714199 . DOI: 10.1080/10872981.2020.1714199 - Tolentino R, Baradaran A, Gore G, Pluye P, Abbasgholizadeh-Rahimi S. Curriculum frameworks and educational programs in ai for medical students, residents, and practicing physicians: Scoping review. JMIR Med Educ. 2024;10:
e54793 . DOI: 10.2196/54793 - D’Souza R, Mathew M, Mishra V, Surapaneni KM. Twelve tips for addressing ethical concerns in the implementation of artificial intelligence in medical education. Med Educ Online. 2024;29(1). DOI: 10.1080/10872981.2024.2330250
- Chadha N, Popil E, Gregory J, Armstrong-Davies L, Justin G. How do we teach generative artificial intelligence to medical educators? Pilot of a faculty development workshop using chatgpt. Med Teach. 2024;1–3. DOI: 10.1080/0142159X.2024.2341806
- Youm J, Corral J. Technological pedagogical content knowledge among medical educators: What is our readiness to teach with technology? Acad Med. 2019;94(11S Association of American Medical Colleges Learn Serve Lead: Proceedings of the 58th Annual Research in Medical Education Sessions):S69–s72. DOI: 10.1097/ACM.0000000000002912
- Sun GH. Prompt engineering for nurse educators. Nurse Educ. 2024;49(6):293–9. DOI: 10.1097/NNE.0000000000001705
- Kıyak YS, Emekli E. Chatgpt prompts for generating multiple-choice questions in medical education and evidence on their validity: A literature review. Postgrad Med J. 2024. DOI: 10.1093/postmj/qgae065
- Magzoub ME, Zafar I, Munshi F, Shersad F. Ten tips to harnessing generative ai for high-quality mcqs in medical education assessment. Med Educ Online. 2025;30(1):
2532682 . DOI: 10.1080/10872981.2025.2532682 - Wass R, Golding C. Sharpening a tool for teaching: The zone of proximal development. Teach High Educ. 2014;19(6):671–84. DOI: 10.1080/13562517.2014.901958
- Leung CH. Promoting optimal learning with chatgpt: A comprehensive exploration of prompt engineering in education. Asian Journal of Contemporary Education. 2024;8(2):104–14. DOI: 10.55493/5052.v8i2.5101
- Heston TF, Khun C. Prompt engineering in medical education. International Medical Education. 2023;2(3):198–205. DOI: 10.3390/ime2030019
- Stadler M, Horrer A, Fischer MR. Crafting medical mcqs with generative ai: A how-to guide on leveraging chatgpt. GMS J Med Educ. 2024;41(2):
Doc20 . - Birks S, Gray J, Darling-Pomranz C. Using artificial intelligence to provide a ‘flipped assessment’ approach to medical education learning opportunities. Med Teach. 2025;47(8):1377–84. DOI: 10.1080/0142159X.2024.2434101
