Have a personal or library account? Click to login
When Assessment Theory Meets Generative AI: Reimagining SBA Design in Medical Education Cover

When Assessment Theory Meets Generative AI: Reimagining SBA Design in Medical Education

Open Access
|Mar 2026

References

  1. McCoubrie P. Improving the fairness of multiple-choice questions: A literature review. Med Teach. 2004;26(8):70912. DOI: 10.1080/01421590400013495
  2. Mirbahai L, Adie JW. Applying the utility index to review single best answer questions in medical education assessment. Archives of Epidemiology and Public Health. 2020;2(1). DOI: 10.15761/AEPH.1000113
  3. Karthikeyan S, O’Connor E, Hu W. Barriers and facilitators to writing quality items for medical school assessments – a scoping review. BMC Med Educ. 2019;19(1):123. DOI: 10.1186/s12909-019-1544-8
  4. Artsi Y, Sorin V, Konen E, Glicksberg BS, Nadkarni G, Klang E. Large language models for generating medical examinations: Systematic review. BMC Med Educ. 2024;24(1):354. DOI: 10.1186/s12909-024-05239-y
  5. Ahmed A, Kerr E, O’Malley A. Quality assurance and validity of ai-generated single best answer questions. BMC Med Educ. 2025;25(1):300. DOI: 10.1186/s12909-025-06881-w
  6. Kaya M, Sonmez E, Halici A, Yildirim H, Coskun A. Comparison of ai-generated and clinician-designed multiple-choice questions in emergency medicine exam: A psychometric analysis. BMC Med Educ. 2025;25(1):949. DOI: 10.1186/s12909-025-07528-6
  7. Wu H, Zerner T, Lee D, Court-Kowalski S, Devitt P, Palmer E. Gpt-4 versus human authors in clinically complex mcq creation: A blinded analysis of item quality. Med Teach. 2025;114. DOI: 10.21203/rs.3.rs-4831476/v1
  8. Van Der Vleuten CPM. The assessment of professional competence: Developments, research and practical implications. Adv Health Sci Educ. 1996;1(1):4167. DOI: 10.1007/BF00596229
  9. Tavakol M, Dennick R. Making sense of cronbach’s alpha. Int J Med Educ. 2011;2:535. DOI: 10.5116/ijme.4dfb.8dfd
  10. Heeneman S, de Jong LH, Dawson LJ, Wilkinson TJ, Ryan A, Tait GR, et al. Ottawa 2020 consensus statement for programmatic assessment – 1. Agreement on the principles. Med Teach. 2021;43(10):113948. DOI: 10.1080/0142159X.2021.1957088
  11. Pham H, Besanko J, Devitt P. Examining the impact of specific types of item-writing flaws on student performance and psychometric properties of the multiple choice question. MedEdPublish (2016). 2018;7:225. DOI: 10.15694/mep.2018.0000225.1
  12. Lee HY, Yune SJ, Lee SY, Im S, Kam BS. The impact of repeated item development training on the prediction of medical faculty members’ item difficulty index. BMC Med Educ. 2024;24(1):599. DOI: 10.1186/s12909-024-05577-x
  13. Webb EM, Phuong JS, Naeger DM. Does educator training or experience affect the quality of multiple-choice questions? Acad Radiol. 2015;22(10):131722. DOI: 10.1016/j.acra.2015.06.012
  14. Taheri R, Nazemi N, Pennington SE, Clark JA, Dadgostari F. Factors influencing educators’ ai adoption: A grounded meta-analysis review. Computers and Education: Artificial Intelligence. 2025;9:100464. DOI: 10.1016/j.caeai.2025.100464
  15. Komasawa N, Yokohira M. Generative artificial intelligence (ai) in medical education: A narrative review of the challenges and possibilities for future professionalism. Cureus. 2025;17(6):e86316. DOI: 10.7759/cureus.86316
  16. Khakpaki A. Advancements in artificial intelligence transforming medical education: A comprehensive overview. Med Educ Online. 2025;30(1):2542807. DOI: 10.1080/10872981.2025.2542807
  17. Preiksaitis C, Rose C. Opportunities, challenges, and future directions of generative artificial intelligence in medical education: Scoping review. JMIR Med Educ. 2023;9:e48785. DOI: 10.2196/48785
  18. Downing SM. The effects of violating standard item writing principles on tests and students: The consequences of using flawed test items on achievement examinations in medical education. Adv Health Sci Educ Theory Pract. 2005;10(2):13343. DOI: 10.1007/s10459-004-4019-5
  19. Russell RG, Lovett Novak L, Patel M, Garvey KV, Craig KJT, Jackson GP, et al. Competencies for the use of artificial intelligence-based tools by health care professionals. Acad Med. 2023;98(3):34856. DOI: 10.1097/ACM.0000000000004963
  20. Storey VC, Yue WT, Zhao JL, Lukyanenko R. Generative artificial intelligence: Evolving technology, growing societal impact, and opportunities for information systems research. Inf Syst Front. 2025;27(5):2081102. DOI: 10.1007/s10796-025-10581-7
  21. Ng IKS, Goh WGW, Teo DB, Chong KM, Tan LF, Teoh CM. Clinical reasoning in real-world practice: A primer for medical trainees and practitioners. Postgrad Med J. 2024;101(1191):6875. DOI: 10.1093/postmj/qgae079
  22. Gruppen LD. Clinical reasoning: Defining it, teaching it, assessing it, studying it. West J Emerg Med. 2017;18(1):47. DOI: 10.5811/westjem.2016.11.33191
  23. Ngo A, Gupta S, Perrine O, Reddy R, Ershadi S, Remick D. Chatgpt 3.5 fails to write appropriate multiple choice practice exam questions. Acad Pathol. 2024;11(1):100099. DOI: 10.1016/j.acpath.2023.100099
  24. Messick S. Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. Am Psychol. 1995;50(9):741. DOI: 10.1037/0003-066X.50.9.741
  25. Boscardin CK, Gin B, Golde PB, Hauer KE. Chatgpt and generative artificial intelligence for medical education: Potential impact and opportunity. Acad Med. 2024;99(1):227. DOI: 10.1097/ACM.0000000000005439
  26. Kıyak YS, Coşkun Ö, Budakoğlu I, Uluoğlu C. Chatgpt for generating multiple-choice questions: Evidence on the use of artificial intelligence in automatic item generation for a rational pharmacotherapy exam. Eur J Clin Pharmacol. 2024;80(5):72935. DOI: 10.1007/s00228-024-03649-x
  27. Cross JL, Choma MA, Onofrey JA. Bias in medical ai: Implications for clinical decision-making. PLOS Digit Health. 2024;3(11):e0000651. DOI: 10.1371/journal.pdig.0000651
  28. Masters K, MacNeil H, Benjamin J, Carver T, Nemethy K, Valanci-Aroesty S, et al. Artificial intelligence in health professions education assessment: Amee guide no. 178. Med Teach. 2025;47(9):141024. DOI: 10.1080/0142159X.2024.2445037
  29. Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, et al. Chain-of-thought prompting elicits reasoning in large language models. Adv Neural Inf Process Syst. 2022;35:2482437.
  30. Cho Y, Park GL, Waite GN, Mudigonda A, Szarek JL. Development of a universal prompt as a scalable generative ai-assisted tool for usmle step 1 style multiple-choice question refinement in medical education. Med Sci Educ. 2025;35(2):6113. DOI: 10.1007/s40670-025-02334-7
  31. Norcini JJ, McKinley DW. Assessment methods in medical education. Teach Teach Educ. 2007;23(3):23950. DOI: 10.1016/j.tate.2006.12.021
  32. Barker AP. Artificial intelligence in health education within higher education institutions. Evid Based Nurs. 2025;28(3):147. DOI: 10.1136/ebnurs-2025-104314
  33. Greenhalgh T, Robert G, Macfarlane F, Bate P, Kyriakidou O. Diffusion of innovations in service organizations: Systematic review and recommendations. Milbank Q. 2004;82(4):581629. DOI: 10.1111/j.0887-378X.2004.00325.x
  34. Moldt JA, Festl-Wietek T, Fuhl W, Zabel S, Claassen M, Wagner S, et al. Assessing ai awareness and identifying essential competencies: Insights from key stakeholders in integrating ai into medical education. JMIR Med Educ. 2024;10:e58355. DOI: 10.2196/58355
  35. Capan Melser M, Steiner-Hofbauer V, Lilaj B, Agis H, Knaus A, Holzinger A. Knowledge, application and how about competence? Qualitative assessment of multiple-choice questions for dental students. Med Educ Online. 2020;25(1):1714199. DOI: 10.1080/10872981.2020.1714199
  36. Tolentino R, Baradaran A, Gore G, Pluye P, Abbasgholizadeh-Rahimi S. Curriculum frameworks and educational programs in ai for medical students, residents, and practicing physicians: Scoping review. JMIR Med Educ. 2024;10:e54793. DOI: 10.2196/54793
  37. D’Souza R, Mathew M, Mishra V, Surapaneni KM. Twelve tips for addressing ethical concerns in the implementation of artificial intelligence in medical education. Med Educ Online. 2024;29(1). DOI: 10.1080/10872981.2024.2330250
  38. Chadha N, Popil E, Gregory J, Armstrong-Davies L, Justin G. How do we teach generative artificial intelligence to medical educators? Pilot of a faculty development workshop using chatgpt. Med Teach. 2024;13. DOI: 10.1080/0142159X.2024.2341806
  39. Youm J, Corral J. Technological pedagogical content knowledge among medical educators: What is our readiness to teach with technology? Acad Med. 2019;94(11S Association of American Medical Colleges Learn Serve Lead: Proceedings of the 58th Annual Research in Medical Education Sessions):S69s72. DOI: 10.1097/ACM.0000000000002912
  40. Sun GH. Prompt engineering for nurse educators. Nurse Educ. 2024;49(6):2939. DOI: 10.1097/NNE.0000000000001705
  41. Kıyak YS, Emekli E. Chatgpt prompts for generating multiple-choice questions in medical education and evidence on their validity: A literature review. Postgrad Med J. 2024. DOI: 10.1093/postmj/qgae065
  42. Magzoub ME, Zafar I, Munshi F, Shersad F. Ten tips to harnessing generative ai for high-quality mcqs in medical education assessment. Med Educ Online. 2025;30(1):2532682. DOI: 10.1080/10872981.2025.2532682
  43. Wass R, Golding C. Sharpening a tool for teaching: The zone of proximal development. Teach High Educ. 2014;19(6):67184. DOI: 10.1080/13562517.2014.901958
  44. Leung CH. Promoting optimal learning with chatgpt: A comprehensive exploration of prompt engineering in education. Asian Journal of Contemporary Education. 2024;8(2):10414. DOI: 10.55493/5052.v8i2.5101
  45. Heston TF, Khun C. Prompt engineering in medical education. International Medical Education. 2023;2(3):198205. DOI: 10.3390/ime2030019
  46. Stadler M, Horrer A, Fischer MR. Crafting medical mcqs with generative ai: A how-to guide on leveraging chatgpt. GMS J Med Educ. 2024;41(2):Doc20.
  47. Birks S, Gray J, Darling-Pomranz C. Using artificial intelligence to provide a ‘flipped assessment’ approach to medical education learning opportunities. Med Teach. 2025;47(8):137784. DOI: 10.1080/0142159X.2024.2434101
DOI: https://doi.org/10.5334/pme.2033 | Journal eISSN: 2212-277X
Language: English
Submitted on: Aug 1, 2025
|
Accepted on: Dec 22, 2025
|
Published on: Mar 12, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Nora Al-Shawee, Gerry McElvaney, Judith Strawbridge, Muirne Spooner, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.