Evaluation of the responses from different chatbots to frequently asked patient questions about impacted canines

Elif Gökçe Erkan Acar; Başak Arslan Avan

doi:10.2478/aoj-2025-0020

.blurhash-client-img { display: none !important; }

Evaluation of the responses from different chatbots to frequently asked patient questions about impacted canines

Australasian Orthodontic Journal

Volume 41 (2025): Issue 1 (January 2025)

By: Elif Gökçe Erkan Acar and Başak Arslan Avan

Open Access

|Sep 2025

Abstract

Background

To evaluate the responses given by ChatGPT 4.0, Google Gemini 1.5 and Claude 3.5 Sonnet chatbots to questions about impacted canines in relation to reliability, accuracy and readability.

Methods

Thirty-five questions were posed to 3 different chatbots and 105 responses were received. The answers were evaluated in relation to reliability (Modified DISCERN), accuracy (Likert scale and Accuracy of Information Index (AOI)) and readability (Flesch-Kincaid Reading Ease Score (FRES) and Flesch-Kincaid grade level (FKGL)). Statistical significance was set at p<0.05.

Results

Gemini had the highest modified DISCERN score (33.66 ± 2.64), followed by Claude (29.70 ± 3.08) and ChatGPT (28.13 ± 2.83). ChatGPT had the highest mean Likert score (4.76 ± 0.43), while Claude and Gemini had 4.71 ± 0.47 and 4.66 ± 0.47, respectively. For the AOI index, ChatGPT had the highest mean score (8.67 ± 0.55), which was statistically significant when compared to others (ChatGPT vs Claude: p=0.042, ChatGPT vs Gemini: p=0.036). All chatbots showed similar readability FRES and FKGL scores without any significant differences (p=0.121 and p=0.377, respectively). Claude expressed responses with significantly fewer words than the other chatbots (Claude vs ChatGPT: p=0.019, Claude vs Gemini: p=0.001) and ChatGPT was the AI service that used the most words (239.74 ± 114.21).

Conclusions

In answering questions about impacted canines, Gemini showed good, while ChatGPT and Claude provided moderate reliability. All chatbots achieved high scores for accuracy. However, the responses were difficult to understand for anyone below a college reading level. Chatbots can serve as a resource for patients seeking general information about impacted canines, potentially enhancing and expediting clinician–patient communication. However, it should be noted that the readability of chatbot-generated texts may pose challenges, thereby affecting overall comprehension. Moreover, due to patient-specific, case-based variations, the most accurate interpretation should be provided by the patient’s healthcare professional. In the future, improved outcomes across all parameters may be achieved through advancements in chatbot technology and increased integration between healthcare providers.

References

Farid H. Chatbots in Digital Marketing. In: Chintalapati & Pandey, eds. Chapter 3: Contemporary Approaches in Digital Marketing and the Role of Machine Intelligence. Hershey, (PA): IGI Global; 2023:46–67.
Search in Google Scholar Back to article
Skjuve M, Følstad A, Fostervold KI, Brandtzaeg PB. My chatbot companion-a study of human-chatbot relationships. Int J Hum Comput Stud 2021;149:102601.
Search in Google Scholar Back to article
Dobbala MK, Lingolu MSS. Conversational ai and chatbots: Enhancing user experience on websites. AJCSIT 2024;7:11.
Search in Google Scholar Back to article
Ait Baha T, El Hajji M, Es-Saady Y, Fadili H. The impact of educational chatbot on student learning experience. Educ Inf Technol 2024;29:10153–76.
Search in Google Scholar Back to article
Cheng Y, Jiang H. Customer-brand relationship in the era of artificial intelligence: understanding the role of chatbot marketing efforts. J Prod Brand Manag 2022;31:252–64.
Search in Google Scholar Back to article
García-Méndez S, De Arriba-Perez F, González-Castaño FJ, Regueiro-Janeiro JA, Gil-Castiñeira F. Entertainment chatbot for the digital inclusion of elderly people without abstraction capabilities. IEEE Access 2021;9:75878–91.
Search in Google Scholar Back to article
Laymouna M, Ma Y, Lessard D, Schuster T, Engler K, Lebouché B. Roles, Users, Benefits, and Limitations of Chatbots in Health Care: Rapid Review. J Med Internet Res 2024;26:e56930.
Search in Google Scholar Back to article
Wang X, Cohen RA. Health Information Technology Use Among Adults: United States, July–December 2022. US Department of Health and Human Services, Centers for Disease Control and Prevention 2023.
Search in Google Scholar Back to article
Bachl M, Link E, Mangold F, Stier S. Search engine use for health-related purposes: Behavioral data on online health information-seeking in Germany. Health Commun 2024;39:1651–64.
Search in Google Scholar Back to article
Li H, Li D, Zhai M, Lin L, Cao Z. Associations Among Online Health Information Seeking Behavior, Online Health Information Perception, and Health Service Utilization: Cross-Sectional Study. J Med Internet Res 2025;27:e66683.
Search in Google Scholar Back to article
De Looper M, van Weert JC, Schouten BC, Bolle S, Belgers EH, Eddes EH, et al. The influence of online health information seeking before a consultation on anxiety, satisfaction, and information recall, mediated by patient participation: field study. J J Med Internet Res 2021;23:e23670.
Search in Google Scholar Back to article
Bibault J-E, Chaix B, Guillemassé A, Cousin S, Escande A, Perrin M, et al. A chatbot versus physicians to provide information for patients with breast cancer: blind, randomized controlled noninferiority trial. J Med Internet Res 2019;21:e15787.
Search in Google Scholar Back to article
Topol E. Deep medicine: how artificial intelligence can make healthcare human again. Hachette UK 2019
Search in Google Scholar Back to article
Alpaydin MT, Buyuk SK, Bavbek NC. Information on the Internet about clear aligner treatment—an assessment of content, quality, and readability. J Orofac Orthop 2021;83(Suppl 1):1.
Search in Google Scholar Back to article
Gritti MN, AlTurki H, Farid P, Morgan CT. Progression of an artificial intelligence chatbot (ChatGPT) for pediatric cardiology educational knowledge assessment. Pediatr Cardiol 2024;45:309–13.
Search in Google Scholar Back to article
Lam WY, Au SCL. Stroke care in the ChatGPT era: Potential use in early symptom recognition. J Acute Dis 2023;12:129–30.
Search in Google Scholar Back to article
Kilinç DD, Mansiz D. Examination of the reliability and readability of Chatbot Generative Pretrained Transformer’s (ChatGPT) responses to questions about orthodontics and the evolution of these responses in an updated version. Am J Orthod Dentofacial Orthop 2024;165:546–55.
Search in Google Scholar Back to article
Daraqel B, Wafaie K, Mohammed H, Cao L, Mheissen S, Liu Y, et al. The performance of artificial intelligence models in generating responses to general orthodontic questions: ChatGPT vs Google Bard. Am J Orthod Dentofacial Orthop 2024;165:652–62.
Search in Google Scholar Back to article
Dursun D, Bilici Geçer R. Can artificial intelligence models serve as patient information consultants in orthodontics? BMC Med Inform Decis Mak 2024;24:211.
Search in Google Scholar Back to article
Hatia A, Doldo T, Parrini S, Chisci E, Cipriani L, Montagna L, et al. Accuracy and completeness of ChatGPT-Generated information on interceptive orthodontics: a Multicenter Collaborative Study. J Clin Med 2024;13:735.
Search in Google Scholar Back to article
Mavreas D, Athanasiou AE. Factors affecting the duration of orthodontic treatment: A systematic review. Eur J of Orthod 2008;30:386–95.
Search in Google Scholar Back to article
Grisar K, Luyten J, Preda F, Martin C, Hoppenreijs T, Politis C, et al. Interventions for impacted maxillary canines: A systematic review of the relationship between initial canine position and treatment outcome. Orthod Craniofac Res 2021;24:180–93.
Search in Google Scholar Back to article
Mancini A, Chirico F, Colella G, Piras F, Colonna V, Marotti P, et al. Evaluating the success rates and effectiveness of surgical and orthodontic interventions for impacted canines: a systematic review of surgical and orthodontic interventions and a case series. BMC Oral Health 2025;25:295.
Search in Google Scholar Back to article
Arslan C, Kahya, K, Cesur E, Cakan DG. An evaluation of orthodontic information quality regarding artificial intelligence (AI) chatbot technologies: A comparison of ChatGPT and google BARD. Australas Orthod J 2024;40:149–57.
Search in Google Scholar Back to article
Charnock D, Shepperd S, Needham G, Gann R. DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. J Epidemiol Community Health 1999;53:105–11.
Search in Google Scholar Back to article
Onder C, Koc G, Gokbulut P, Taskaldiran I, Kuskonmaz S. Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy. Sci Rep 2024;14:243.
Search in Google Scholar Back to article
Kincaid JP, Fishburne Jr RP, Rogers RL, Chissom BS. Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel (Research Branch Report 8-75). Naval Technical Training Command 1975.
Search in Google Scholar Back to article
Chu JT, Wang MP, Shen C, Viswanath K, Lam TH, Chan SSC. How, when and why people seek health information online: qualitative study in Hong Kong. Interact J Med Res 2017;6:e7000.
Search in Google Scholar Back to article
Bachl M, Link E, Mangold F, Stier S. Search engine use for health-related purposes: Behavioral data on online health information-seeking in Germany. Health Commun 2024;39:1–14.
Search in Google Scholar Back to article
Davis RJ, Ayo-Ajibola O, Lin ME, Swanson MS, Chambers TN, Kwon DI, et al. Evaluation of oropharyngeal cancer information from revolutionary artificial intelligence chatbot. The Laryngoscope 2024;134:2252–7.
Search in Google Scholar Back to article
Balel Y. Can ChatGPT be used in oral and maxillofacial surgery? J Stomatol Oral Maxillofac Surg 2023;124:101471.
Search in Google Scholar Back to article
Tanaka OM, Gasparello GG, Hartmann GC, Casagrande FA, Pithon MM. Assessing the reliability of ChatGPT: a content analysis of self-generated and self-answered questions on clear aligners, TADs and digital imaging. Dental Press J Orthod 2023;28: e2323183.
Search in Google Scholar Back to article
Perez-Pino A, Yadav S, Upadhyay M, Cardarelli L, Tadinada A. The accuracy of artificial intelligence–based virtual assistants in responding to routinely asked questions about orthodontics. Angle Orthod 2023;93:427–32.
Search in Google Scholar Back to article
Tanaka OM, Weissheimer A, Pithon MM, Gasparello GG, Araújo EA. Focus on leveling the hidden: managing impacted maxillary canines. Dental Press J Orthod 2024;29:e24spe5.
Search in Google Scholar Back to article
Smutny P, Bojko M. Comparative Analysis of Chatbots Using Large Language Models for Web Development Tasks. Appl Sci 2024;14:10048.
Search in Google Scholar Back to article
Vaishya R, Misra A, Vaish A. ChatGPT: Is this version good for healthcare and research? Diabetes Metab Syndr 2023;17: 102744.
Search in Google Scholar Back to article
Aziz AAA, Abdelrahman HH, Hassan MG. The use of ChatGPT and Google Gemini in responding to orthognathic surgery-related questions: A comparative study. J World Fed Orthod 2025;14: 20–6.
Search in Google Scholar Back to article
Carifio J, Perla RJ. A critique of the theoretical and empirical literature of the use of diagrams, graphs, and other visual aids in the learning of scientific-technical content from expository texts and instruction. Interchange 2009;40:403–36.
Search in Google Scholar Back to article
Alhazmi K. The Effect of Multimedia on Vocabulary Learning and Retention. World J Engl Lang 2024;14:390.
Search in Google Scholar Back to article
Taymour N, Fouda SM, Abdelrahaman HH, Hassan MG. Performance of the ChatGPT-3.5, ChatGPT-4, and Google Gemini large language models in responding to dental implantology inquiries. J Prosthet Dent 2025. [Epub ahead of print]
Search in Google Scholar Back to article
Johnson AJ, Singh TK, Gupta A, Sankar H, Gill I, Shalini M, et al. Evaluation of validity and reliability of AI Chatbots as public sources of information on dental trauma. Dent Traumatol 2024;41 41:187–193.
Search in Google Scholar Back to article
Mustuloğlu Ş, Deniz BP. Evaluation of Chatbots in the Emergency Management of Avulsion Injuries. Dent Traumatol 2025:1–8.
Search in Google Scholar Back to article
Sami MA, Samad MA, Parekh K, Suthar PP. Comparative accuracy of ChatGPT 4.0 and Google Gemini in answering pediatric radiology text-based questions. Cureus 2024;16: e70897.
Search in Google Scholar Back to article
Strzalkowski P, Strzalkowska A, Chhablani J, Pfau K, Errera MH, Roth M, et al. Evaluation of the accuracy and readability of ChatGPT-4 and Google Gemini in providing information on retinal detachment: A multicenter expert comparative study. Int J Retina Vitreous 2024;10:61.
Search in Google Scholar Back to article
Mishra V, Dexter JP. Comparison of readability of official public health information about COVID-19 on websites of international agencies and the governments of 15 countries. JAMA Netw Open 2020;3:e2018033–e2018033.
Search in Google Scholar Back to article
Abou-Abdallah M, Dar T, Mahmudzade Y, Michaels J, Talwar R, Tornari C. The quality and readability of patient information provided by ChatGPT: can AI reliably explain common ENT operations? Eur Arch Otorhinolaryngol 2024;281:6147–6153.
Search in Google Scholar Back to article
Yurdakurban E, Topsakal KG, Duran GS. A comparative analysis of AI-based chatbots: Assessing data quality in orthognathic surgery related patient information. J Stomol Oral Maxillofac Surg 2024;125:101757.
Search in Google Scholar Back to article
Mohammad-Rahimi H, Ourang SA, Pourhoseingholi MA, Dianat O, Dummer PMH, Nosrat A. Validity and reliability of artificial intelligence chatbots as public sources of information on endodontics. Int Endod J 2024;57:305–14.
Search in Google Scholar Back to article
Lima NGM, Costa L, Santos PB. ChatGPT in orthodontics: Limitations and possibilities. Australas Orthod J 2024;40:19–21.
Search in Google Scholar Back to article
Andrikyan W, Sametinger SM, Kosfeld F, Jung-Poppe L, Fromm MF, Maas R, et al. Artificial intelligence-powered chatbots in search engines: a cross-sectional study on the quality and risks of drug information for patients. BMJ Qual Saf 2025;34:100–9.
Search in Google Scholar Back to article
Zada T, Tam N, Barnard F, Van Sittert M, Bhat V, Rambhatla S. Medical Misinformation in AI-Assisted Self-Diagnosis: Development of a Method (EvalPrompt) for Analyzing Large Language Models. JMIR Form Res 2025;9:e66207.
Search in Google Scholar Back to article

Articles in this issue

DOI: https://doi.org/10.2478/aoj-2025-0020 | Journal eISSN: 2207-7480 | Journal ISSN: 2207-7472

Journal RSS Feed

Language: English

Page range: 288 - 300

Submitted on: Mar 1, 2025

Accepted on: May 1, 2025

Published on: Sep 1, 2025

Published by: Australian Society of Orthodontists Inc.

In partnership with: Paradigm Publishing Services

Publication frequency: 1 issue per year

Related subjects:

Medicine,

Basic medical science,

Basic medical science, other

© 2025 Elif Gökçe Erkan Acar, Başak Arslan Avan, published by Australian Society of Orthodontists Inc.
This work is licensed under the Creative Commons Attribution 4.0 License.

Volume 41 (2025): Issue 1 (January 2025)