Comparison of Automatic Item Generation Methods in the Assessment of Clinical Reasoning Skills

Autores

DOI: https://doi.org/10.6018/edumed.637221
Palavras-chave: clinical reasoning, automated item generation, template-based method, ChatGPT, multiple-choice questions

Resumo

The use of automatic item generation (AIG) methods offers potential for assessing clinical reasoning (CR) skills in medical education, a critical skill combining intuitive and analytical thinking. In preclinical education, these skills are commonly evaluated through written exams and case-based multiple-choice questions (MCQs), which are widely used due to the high number of students, ease of standardization, and quick evaluation. This research generated CR-focused questions for medical exams using two primary AIG methods: template-based and non-template-based (using AI tools like ChatGPT for a flexible approach). A total of 18 questions were produced on ordering radiologic investigations for abdominal emergencies, alongside faculty-developed questions used in medical exams for comparison. Experienced radiologists evaluated the questions based on clarity, clinical relevance, and effectiveness in measuring CR skills. Results showed that ChatGPT-generated questions measured CR skills with an 84.52% success rate, faculty-developed questions with 82.14%, and template-based questions with 78.57%, indicating that both AIG methods are effective in CR assessment, with ChatGPT performing slightly better. Both AIG methods received high ratings for clarity and clinical suitability, showing promise in producing effective CR-assessing questions comparable to, and in some cases surpassing, faculty-developed questions. While template-based AIG is effective, it requires more time and effort, suggesting that both methods may offer time-saving potential in exam preparation for educators.

Downloads

Não há dados estatísticos.

Referências

Bonilauri Ferreira AP, Ferreira RF, Rajgor D, Shah J, Menezes A, Pietrobon R. Clinical reasoning in the real world is mediated by bounded rationality: implications for diagnostic clinical practice guidelines. PLOS ONE. 2010; 5(4): e10265. https://doi.org/10.1371/journal.pone.0010265

Pelaccia T, Plotnick LH, Audétat MC, Nendaz M, Lubarsky S, Thomas A, Young M, Dory VA. A Scoping Review of Physicians' Clinical Reasoning in Emergency Departments. Annals of Emergency Medicine. 2020;75(2): 206-217. https://doi.org/10.1016/j.annemergmed.2019.06.023

Gruppen LD. Clinical Reasoning: Defining It, Teaching It, Assessing It, Studying It. West Journal of Emergency Medicine. 2017; 18(1): 4-7. https://doi.org/10.5811/westjem.2016.11.33191

Simmons B. Clinical reasoning: concept analysis. Journal of Advanced Nursing. 2010;66(5): 1151-1158. https://doi.org/10.1111/j.1365-2648.2010.05262.x

Schmidt HG, Mamede S. How to improve the teaching of clinical reasoning: a narrative review and a proposal. Medical Education. 2015;49(10): 961-973. https://doi.org/10.1111/medu.12775

Miller GE. The assessment of clinical skills/competence/performance. Academic Medicine. 1990; 65(9 Suppl): 63-67. https://doi.org/10.1097/00001888-199009000-00045

Brentnall J, Thackray D, Judd B. Evaluating the Clinical Reasoning of Student Health Professionals in Placement and Simulation Settings: A Systematic Review. Int J Environ Res Public Health. 2022;19(2). https://doi.org/10.3390/ijerph19020936

Modi JN, Anshu Gupta P, Singh T. Teaching and Assessing Clinical Reasoning Skills. Indian Pediatrics. 2015;52(9): 787-794. https://doi.org/10.1007/s13312-015-0718-7

Schuwirth LW, van der Vleuten CP. Different written assessment methods: what can be said about their strengths and weaknesses? Medical Education. 2004;38(9): 974-979. https://doi.org/10.1111/j.1365-2929.2004.01916.x

Wrigley W, van der Vleuten CP, Freeman A, Muijtjens A. A systemic framework for the progress test: strengths, constraints and issues: AMEE Guide No. 71. Medical Teacher. 2012; 34(9): 683-697. https://doi.org/10.3109/0142159x.2012.704437

Gierl MJ, Lai H, Tanygin V. Advanced methods in automatic item generation: Routledge, 2021.

Cheung BHH, Lau GKK, Wong GTC, Lee EYP, Kulkarni D, Seow CS, Wong R, Co MT. ChatGPT versus human in generating medical graduate exam multiple choice questions-A multinational prospective study (Hong Kong S.A.R., Singapore, Ireland, and the United Kingdom). PLOS ONE. 2023;18(8): e0290691. https://doi.org/10.1371/journal.pone.0290691

Kıyak YS, Coşkun Ö, Budakoğlu I, Uluoğlu C. ChatGPT for generating multiple-choice questions: Evidence on the use of artificial intelligence in automatic item generation for a rational pharmacotherapy exam. Eurapean Journal of Clinical Pharmacoogyl. 2024;80(5): 729-735. https://doi.org/10.1007/s00228-024-03649-x

Kıyak YS, Emekli E. ChatGPT prompts for generating multiple-choice questions in medical education and evidence on their validity: a literature review. Postgraduate Medical Journal. 2024;6: qgae065 https://doi.org/10.1093/postmj/qgae065

Williamson SM, Prybutok V. The Era of Artificial Intelligence Deception: Unraveling the Complexities of False Realities and Emerging Threats of Misinformation. Information. 2024;15(6): 299 https://doi.org/10.3390/info15060299

Pugh D, De Champlain A, Gierl M, Lai H, Touchie C. Using cognitive models to develop quality multiple-choice questions. Medical Teacher. 2016;38(8): 838-843. https://doi.org/10.3109/0142159x.2016.1150989

Ulusal Cep-2020 UCG, Ulusal Cep-2020 UYVYCG, Ulusal Cep-2020 DSBBCG. Medical Faculty - National Core Curriculum 2020. Tıp Eğitimi Dünyası 2020; 19: 141-146. https://doi.org/10.25282/ted.716873

American College of Radiology. Appropriateness Criteria Available online: https://www.acr.org/Clinical-Resources/ACR-Appropriateness-Criteria (15.08.2024)

Gierl MJ, Lai H, Turner SR. Using automatic item generation to create multiple-choice test items. Medical Education. 2012;46(8): 757-765. https://doi.org/10.1111/j.1365-2923.2012.04289.x

Gierl MJ, Lai H. Evaluating the quality of medical multiple-choice items created with automated processes. Medical Education. 2013;47(7): 726-733. https://doi.org/10.1111/medu.12202

Gierl MJ, Lai H. Using Automated Processes to Generate Test Items And Their Associated Solutions and Rationales to Support Formative Feedback. IxD&A. 2015;25: 9-20. https://doi.org/10.1177/0146621617726788

Kıyak YS. A ChatGPT Prompt for Writing Case-Based Multiple-ChoiceQuestions. Revista Española de Educación Médica. 2023; 4(3). https://doi.org/10.6018/edumed.587451

Pugh D, De Champlain A, Gierl M, Lai H, Touchie C. Can automated item generation be used to develop high quality MCQs that assess application of knowledge? Research and Practice in Technology Enhanced Learning. 2020;15: 11-13. https://doi.org/10.1186/s41039-020-00134-8

Cansever Z, Acemoğlu H, Avşar Ü, Hoşoğlu S. Tıp fakültesindeki çoktan seçmeli sınav sorularının değerlendirilmesi. Tıp Eğitimi Dünyası. 2016;14(44): 44-55. https://doi.org/10.25282/ted.228764

Kıyak YS, Emekli E. A Prompt for Generating Script Concordance Test Using ChatGPT, Claude, and Llama Large Language Model Chatbots. Revista Española de Educación Médica. 2024;5(3). https://doi.org/10.6018/edumed.612381

Kurdi G, Leo J, Parsia B, Sattler U, Al-Emari S. A systematic review of automatic question generation for educational purposes. International Journal of Artificial Intelligence in Education. 2020; 30: 121-204. https://doi.org/10.1007/s40593-019-00186-y

Kıyak YS, Budakoğlu Iİ, Coşkun Ö, Koyun E. The first automatic item generation in Turkish for assessment of clinical reasoning in medical education. Tıp Eğitimi Dünyası. 2023; 22(66): 72-90. https://doi.org/10.25282/ted.1225814

Ngo A, Gupta S, Perrine O, Reddy R, Ershadi S, Remick D. ChatGPT 3.5 fails to write appropriate multiple choice practice exam questions. Academic Pathology. 2024;11(1): 100099. https://doi.org/10.1016/j.acpath.2023.100099

Kıyak YS. Beginner-Level Tips for Medical Educators: Guidance on Selection, Prompt Engineering, and the Use of Artificial Intelligence Chatbots. Medical Science Educator. 2024; 1-6. https://doi.org/10.1007/s40670-024-02146-1

Publicado
25-11-2024
Como Citar
Emekli, E., & Karahan, B. N. (2024). Comparison of Automatic Item Generation Methods in the Assessment of Clinical Reasoning Skills. Revista Espanhola de Educação Médica, 6(1). https://doi.org/10.6018/edumed.637221

Publication Facts

Metric
This article
Other articles
Peer reviewers 
2
2,4

Reviewer profiles  Indisp.

Author statements

Author statements
This article
Other articles
Data availability 
N/A
16%
External funding 
Indisp.
32%
Competing interests 
Indisp.
11%
Metric
This journal
Other journals
Articles accepted 
85%
33%
Days to publication 
13
145

Indexed in

Editor & editorial board
profiles
Academic society 
Universidad de Murcia
Publisher 
Ediciones de la Universidad de Murcia (Editum)