Comparison of Automatic Item Generation Methods in the Assessment of Clinical Reasoning Skills
Resumo
The use of automatic item generation (AIG) methods offers potential for assessing clinical reasoning (CR) skills in medical education, a critical skill combining intuitive and analytical thinking. In preclinical education, these skills are commonly evaluated through written exams and case-based multiple-choice questions (MCQs), which are widely used due to the high number of students, ease of standardization, and quick evaluation. This research generated CR-focused questions for medical exams using two primary AIG methods: template-based and non-template-based (using AI tools like ChatGPT for a flexible approach). A total of 18 questions were produced on ordering radiologic investigations for abdominal emergencies, alongside faculty-developed questions used in medical exams for comparison. Experienced radiologists evaluated the questions based on clarity, clinical relevance, and effectiveness in measuring CR skills. Results showed that ChatGPT-generated questions measured CR skills with an 84.52% success rate, faculty-developed questions with 82.14%, and template-based questions with 78.57%, indicating that both AIG methods are effective in CR assessment, with ChatGPT performing slightly better. Both AIG methods received high ratings for clarity and clinical suitability, showing promise in producing effective CR-assessing questions comparable to, and in some cases surpassing, faculty-developed questions. While template-based AIG is effective, it requires more time and effort, suggesting that both methods may offer time-saving potential in exam preparation for educators.
Downloads
Referências
Bonilauri Ferreira AP, Ferreira RF, Rajgor D, Shah J, Menezes A, Pietrobon R. Clinical reasoning in the real world is mediated by bounded rationality: implications for diagnostic clinical practice guidelines. PLOS ONE. 2010; 5(4): e10265. https://doi.org/10.1371/journal.pone.0010265
Pelaccia T, Plotnick LH, Audétat MC, Nendaz M, Lubarsky S, Thomas A, Young M, Dory VA. A Scoping Review of Physicians' Clinical Reasoning in Emergency Departments. Annals of Emergency Medicine. 2020;75(2): 206-217. https://doi.org/10.1016/j.annemergmed.2019.06.023
Gruppen LD. Clinical Reasoning: Defining It, Teaching It, Assessing It, Studying It. West Journal of Emergency Medicine. 2017; 18(1): 4-7. https://doi.org/10.5811/westjem.2016.11.33191
Simmons B. Clinical reasoning: concept analysis. Journal of Advanced Nursing. 2010;66(5): 1151-1158. https://doi.org/10.1111/j.1365-2648.2010.05262.x
Schmidt HG, Mamede S. How to improve the teaching of clinical reasoning: a narrative review and a proposal. Medical Education. 2015;49(10): 961-973. https://doi.org/10.1111/medu.12775
Miller GE. The assessment of clinical skills/competence/performance. Academic Medicine. 1990; 65(9 Suppl): 63-67. https://doi.org/10.1097/00001888-199009000-00045
Brentnall J, Thackray D, Judd B. Evaluating the Clinical Reasoning of Student Health Professionals in Placement and Simulation Settings: A Systematic Review. Int J Environ Res Public Health. 2022;19(2). https://doi.org/10.3390/ijerph19020936
Modi JN, Anshu Gupta P, Singh T. Teaching and Assessing Clinical Reasoning Skills. Indian Pediatrics. 2015;52(9): 787-794. https://doi.org/10.1007/s13312-015-0718-7
Schuwirth LW, van der Vleuten CP. Different written assessment methods: what can be said about their strengths and weaknesses? Medical Education. 2004;38(9): 974-979. https://doi.org/10.1111/j.1365-2929.2004.01916.x
Wrigley W, van der Vleuten CP, Freeman A, Muijtjens A. A systemic framework for the progress test: strengths, constraints and issues: AMEE Guide No. 71. Medical Teacher. 2012; 34(9): 683-697. https://doi.org/10.3109/0142159x.2012.704437
Gierl MJ, Lai H, Tanygin V. Advanced methods in automatic item generation: Routledge, 2021.
Cheung BHH, Lau GKK, Wong GTC, Lee EYP, Kulkarni D, Seow CS, Wong R, Co MT. ChatGPT versus human in generating medical graduate exam multiple choice questions-A multinational prospective study (Hong Kong S.A.R., Singapore, Ireland, and the United Kingdom). PLOS ONE. 2023;18(8): e0290691. https://doi.org/10.1371/journal.pone.0290691
Kıyak YS, Coşkun Ö, Budakoğlu I, Uluoğlu C. ChatGPT for generating multiple-choice questions: Evidence on the use of artificial intelligence in automatic item generation for a rational pharmacotherapy exam. Eurapean Journal of Clinical Pharmacoogyl. 2024;80(5): 729-735. https://doi.org/10.1007/s00228-024-03649-x
Kıyak YS, Emekli E. ChatGPT prompts for generating multiple-choice questions in medical education and evidence on their validity: a literature review. Postgraduate Medical Journal. 2024;6: qgae065 https://doi.org/10.1093/postmj/qgae065
Williamson SM, Prybutok V. The Era of Artificial Intelligence Deception: Unraveling the Complexities of False Realities and Emerging Threats of Misinformation. Information. 2024;15(6): 299 https://doi.org/10.3390/info15060299
Pugh D, De Champlain A, Gierl M, Lai H, Touchie C. Using cognitive models to develop quality multiple-choice questions. Medical Teacher. 2016;38(8): 838-843. https://doi.org/10.3109/0142159x.2016.1150989
Ulusal Cep-2020 UCG, Ulusal Cep-2020 UYVYCG, Ulusal Cep-2020 DSBBCG. Medical Faculty - National Core Curriculum 2020. Tıp Eğitimi Dünyası 2020; 19: 141-146. https://doi.org/10.25282/ted.716873
American College of Radiology. Appropriateness Criteria Available online: https://www.acr.org/Clinical-Resources/ACR-Appropriateness-Criteria (15.08.2024)
Gierl MJ, Lai H, Turner SR. Using automatic item generation to create multiple-choice test items. Medical Education. 2012;46(8): 757-765. https://doi.org/10.1111/j.1365-2923.2012.04289.x
Gierl MJ, Lai H. Evaluating the quality of medical multiple-choice items created with automated processes. Medical Education. 2013;47(7): 726-733. https://doi.org/10.1111/medu.12202
Gierl MJ, Lai H. Using Automated Processes to Generate Test Items And Their Associated Solutions and Rationales to Support Formative Feedback. IxD&A. 2015;25: 9-20. https://doi.org/10.1177/0146621617726788
Kıyak YS. A ChatGPT Prompt for Writing Case-Based Multiple-ChoiceQuestions. Revista Española de Educación Médica. 2023; 4(3). https://doi.org/10.6018/edumed.587451
Pugh D, De Champlain A, Gierl M, Lai H, Touchie C. Can automated item generation be used to develop high quality MCQs that assess application of knowledge? Research and Practice in Technology Enhanced Learning. 2020;15: 11-13. https://doi.org/10.1186/s41039-020-00134-8
Cansever Z, Acemoğlu H, Avşar Ü, Hoşoğlu S. Tıp fakültesindeki çoktan seçmeli sınav sorularının değerlendirilmesi. Tıp Eğitimi Dünyası. 2016;14(44): 44-55. https://doi.org/10.25282/ted.228764
Kıyak YS, Emekli E. A Prompt for Generating Script Concordance Test Using ChatGPT, Claude, and Llama Large Language Model Chatbots. Revista Española de Educación Médica. 2024;5(3). https://doi.org/10.6018/edumed.612381
Kurdi G, Leo J, Parsia B, Sattler U, Al-Emari S. A systematic review of automatic question generation for educational purposes. International Journal of Artificial Intelligence in Education. 2020; 30: 121-204. https://doi.org/10.1007/s40593-019-00186-y
Kıyak YS, Budakoğlu Iİ, Coşkun Ö, Koyun E. The first automatic item generation in Turkish for assessment of clinical reasoning in medical education. Tıp Eğitimi Dünyası. 2023; 22(66): 72-90. https://doi.org/10.25282/ted.1225814
Ngo A, Gupta S, Perrine O, Reddy R, Ershadi S, Remick D. ChatGPT 3.5 fails to write appropriate multiple choice practice exam questions. Academic Pathology. 2024;11(1): 100099. https://doi.org/10.1016/j.acpath.2023.100099
Kıyak YS. Beginner-Level Tips for Medical Educators: Guidance on Selection, Prompt Engineering, and the Use of Artificial Intelligence Chatbots. Medical Science Educator. 2024; 1-6. https://doi.org/10.1007/s40670-024-02146-1
Direitos de Autor (c) 2024 Serviço de Publicações da Universidade de Múrcia
Este trabalho encontra-se publicado com a Licença Internacional Creative Commons Atribuição-NãoComercial-SemDerivações 4.0.
Os trabalhos publicados nesta revista estão sujeitos aos seguintes termos:
1. O Serviço de Publicações da Universidade de Murcia (o editor) preserva os direitos económicos (direitos de autor) das obras publicadas e favorece e permite a sua reutilização ao abrigo da licença de utilização indicada no ponto 2.
2. Os trabalhos são publicados sob uma licença Creative Commons Atribuição-NãoComercial-NãoDerivada 4.0.
3. Condições de autoarquivamento. Os autores estão autorizados e incentivados a divulgar eletronicamente as versões pré-impressas (versão antes de ser avaliada e enviada à revista) e / ou pós-impressas (versão avaliada e aceita para publicação) de seus trabalhos antes da publicação, desde que favorece sua circulação e difusão mais precoce e com ela possível aumento de sua citação e alcance junto à comunidade acadêmica.