Comparison of Automatic Item Generation Methods in the Assessment of Clinical Reasoning Skills
Resumen
The use of automatic item generation (AIG) methods offers potential for assessing clinical reasoning (CR) skills in medical education, a critical skill combining intuitive and analytical thinking. In preclinical education, these skills are commonly evaluated through written exams and case-based multiple-choice questions (MCQs), which are widely used due to the high number of students, ease of standardization, and quick evaluation. This research generated CR-focused questions for medical exams using two primary AIG methods: template-based and non-template-based (using AI tools like ChatGPT for a flexible approach). A total of 18 questions were produced on ordering radiologic investigations for abdominal emergencies, alongside faculty-developed questions used in medical exams for comparison. Experienced radiologists evaluated the questions based on clarity, clinical relevance, and effectiveness in measuring CR skills. Results showed that ChatGPT-generated questions measured CR skills with an 84.52% success rate, faculty-developed questions with 82.14%, and template-based questions with 78.57%, indicating that both AIG methods are effective in CR assessment, with ChatGPT performing slightly better. Both AIG methods received high ratings for clarity and clinical suitability, showing promise in producing effective CR-assessing questions comparable to, and in some cases surpassing, faculty-developed questions. While template-based AIG is effective, it requires more time and effort, suggesting that both methods may offer time-saving potential in exam preparation for educators.
Descargas
Métricas
Citas
Bonilauri Ferreira AP, Ferreira RF, Rajgor D, Shah J, Menezes A, Pietrobon R. Clinical reasoning in the real world is mediated by bounded rationality: implications for diagnostic clinical practice guidelines. PLOS ONE. 2010; 5(4): e10265. https://doi.org/10.1371/journal.pone.0010265
Pelaccia T, Plotnick LH, Audétat MC, Nendaz M, Lubarsky S, Thomas A, Young M, Dory VA. A Scoping Review of Physicians' Clinical Reasoning in Emergency Departments. Annals of Emergency Medicine. 2020;75(2): 206-217. https://doi.org/10.1016/j.annemergmed.2019.06.023
Gruppen LD. Clinical Reasoning: Defining It, Teaching It, Assessing It, Studying It. West Journal of Emergency Medicine. 2017; 18(1): 4-7. https://doi.org/10.5811/westjem.2016.11.33191
Simmons B. Clinical reasoning: concept analysis. Journal of Advanced Nursing. 2010;66(5): 1151-1158. https://doi.org/10.1111/j.1365-2648.2010.05262.x
Schmidt HG, Mamede S. How to improve the teaching of clinical reasoning: a narrative review and a proposal. Medical Education. 2015;49(10): 961-973. https://doi.org/10.1111/medu.12775
Miller GE. The assessment of clinical skills/competence/performance. Academic Medicine. 1990; 65(9 Suppl): 63-67. https://doi.org/10.1097/00001888-199009000-00045
Brentnall J, Thackray D, Judd B. Evaluating the Clinical Reasoning of Student Health Professionals in Placement and Simulation Settings: A Systematic Review. Int J Environ Res Public Health. 2022;19(2). https://doi.org/10.3390/ijerph19020936
Modi JN, Anshu Gupta P, Singh T. Teaching and Assessing Clinical Reasoning Skills. Indian Pediatrics. 2015;52(9): 787-794. https://doi.org/10.1007/s13312-015-0718-7
Schuwirth LW, van der Vleuten CP. Different written assessment methods: what can be said about their strengths and weaknesses? Medical Education. 2004;38(9): 974-979. https://doi.org/10.1111/j.1365-2929.2004.01916.x
Wrigley W, van der Vleuten CP, Freeman A, Muijtjens A. A systemic framework for the progress test: strengths, constraints and issues: AMEE Guide No. 71. Medical Teacher. 2012; 34(9): 683-697. https://doi.org/10.3109/0142159x.2012.704437
Gierl MJ, Lai H, Tanygin V. Advanced methods in automatic item generation: Routledge, 2021.
Cheung BHH, Lau GKK, Wong GTC, Lee EYP, Kulkarni D, Seow CS, Wong R, Co MT. ChatGPT versus human in generating medical graduate exam multiple choice questions-A multinational prospective study (Hong Kong S.A.R., Singapore, Ireland, and the United Kingdom). PLOS ONE. 2023;18(8): e0290691. https://doi.org/10.1371/journal.pone.0290691
Kıyak YS, Coşkun Ö, Budakoğlu I, Uluoğlu C. ChatGPT for generating multiple-choice questions: Evidence on the use of artificial intelligence in automatic item generation for a rational pharmacotherapy exam. Eurapean Journal of Clinical Pharmacoogyl. 2024;80(5): 729-735. https://doi.org/10.1007/s00228-024-03649-x
Kıyak YS, Emekli E. ChatGPT prompts for generating multiple-choice questions in medical education and evidence on their validity: a literature review. Postgraduate Medical Journal. 2024;6: qgae065 https://doi.org/10.1093/postmj/qgae065
Williamson SM, Prybutok V. The Era of Artificial Intelligence Deception: Unraveling the Complexities of False Realities and Emerging Threats of Misinformation. Information. 2024;15(6): 299 https://doi.org/10.3390/info15060299
Pugh D, De Champlain A, Gierl M, Lai H, Touchie C. Using cognitive models to develop quality multiple-choice questions. Medical Teacher. 2016;38(8): 838-843. https://doi.org/10.3109/0142159x.2016.1150989
Ulusal Cep-2020 UCG, Ulusal Cep-2020 UYVYCG, Ulusal Cep-2020 DSBBCG. Medical Faculty - National Core Curriculum 2020. Tıp Eğitimi Dünyası 2020; 19: 141-146. https://doi.org/10.25282/ted.716873
American College of Radiology. Appropriateness Criteria Available online: https://www.acr.org/Clinical-Resources/ACR-Appropriateness-Criteria (15.08.2024)
Gierl MJ, Lai H, Turner SR. Using automatic item generation to create multiple-choice test items. Medical Education. 2012;46(8): 757-765. https://doi.org/10.1111/j.1365-2923.2012.04289.x
Gierl MJ, Lai H. Evaluating the quality of medical multiple-choice items created with automated processes. Medical Education. 2013;47(7): 726-733. https://doi.org/10.1111/medu.12202
Gierl MJ, Lai H. Using Automated Processes to Generate Test Items And Their Associated Solutions and Rationales to Support Formative Feedback. IxD&A. 2015;25: 9-20. https://doi.org/10.1177/0146621617726788
Kıyak YS. A ChatGPT Prompt for Writing Case-Based Multiple-ChoiceQuestions. Revista Española de Educación Médica. 2023; 4(3). https://doi.org/10.6018/edumed.587451
Pugh D, De Champlain A, Gierl M, Lai H, Touchie C. Can automated item generation be used to develop high quality MCQs that assess application of knowledge? Research and Practice in Technology Enhanced Learning. 2020;15: 11-13. https://doi.org/10.1186/s41039-020-00134-8
Cansever Z, Acemoğlu H, Avşar Ü, Hoşoğlu S. Tıp fakültesindeki çoktan seçmeli sınav sorularının değerlendirilmesi. Tıp Eğitimi Dünyası. 2016;14(44): 44-55. https://doi.org/10.25282/ted.228764
Kıyak YS, Emekli E. A Prompt for Generating Script Concordance Test Using ChatGPT, Claude, and Llama Large Language Model Chatbots. Revista Española de Educación Médica. 2024;5(3). https://doi.org/10.6018/edumed.612381
Kurdi G, Leo J, Parsia B, Sattler U, Al-Emari S. A systematic review of automatic question generation for educational purposes. International Journal of Artificial Intelligence in Education. 2020; 30: 121-204. https://doi.org/10.1007/s40593-019-00186-y
Kıyak YS, Budakoğlu Iİ, Coşkun Ö, Koyun E. The first automatic item generation in Turkish for assessment of clinical reasoning in medical education. Tıp Eğitimi Dünyası. 2023; 22(66): 72-90. https://doi.org/10.25282/ted.1225814
Ngo A, Gupta S, Perrine O, Reddy R, Ershadi S, Remick D. ChatGPT 3.5 fails to write appropriate multiple choice practice exam questions. Academic Pathology. 2024;11(1): 100099. https://doi.org/10.1016/j.acpath.2023.100099
Kıyak YS. Beginner-Level Tips for Medical Educators: Guidance on Selection, Prompt Engineering, and the Use of Artificial Intelligence Chatbots. Medical Science Educator. 2024; 1-6. https://doi.org/10.1007/s40670-024-02146-1
Derechos de autor 2024 Servicio de Publicaciones de la Universidad de Murcia
Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial-SinDerivadas 4.0.
Las obras que se publican en esta revista están sujetas a los siguientes términos:
1. El Servicio de Publicaciones de la Universidad de Murcia (la editorial) conserva los derechos patrimoniales (copyright) de las obras publicadas y favorece y permite la reutilización de las mismas bajo la licencia de uso indicada en el punto 2.
© Servicio de Publicaciones, Universidad de Murcia
2. Las obras se publican bajo una licencia Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0.
3. Condiciones de auto-archivo. Se permite y se anima a los autores a difundir electrónicamente las versiones preprint (versión antes de ser evaluada y enviada a la revista) y/o post-print (versión evaluada y aceptada para su publicación) de sus obras antes de su publicación, ya que favorece su circulación y difusión más temprana y con ello un posible aumento en su citación y alcance entre la comunidad académica.