Using Large Language Models to Generate Script Concordance Test in Medical Education: ChatGPT and Claude
Resumo
We aimed to determine the quality of AI-generated (ChatGPT-4 and Claude 3) Script Concordance Test (SCT) items through an expert panel. We generated SCT items on abdominal radiology using a complex prompt in large language model (LLM) chatbots (ChatGPT-4 and Claude 3 (Sonnet) in April 2024) and evaluated the items’ quality through an expert panel of 16 radiologists. Expert panel, which was blind to the origin of the items provided without modifications, independently answered each item and assessed them using 12 quality indicators. Data analysis included descriptive statistics, bar charts to compare responses against accepted forms, and a heatmap to show performance in terms of the quality indicators. SCT items generated by chatbots assess clinical reasoning rather than only factual recall (ChatGPT: 92.50%, Claude: 85.00%). The heatmap indicated that the items were generally acceptable, with most responses favorable across quality indicators (ChatGPT: 71.77%, Claude: 64.23%). The comparison of the bar charts with acceptable and unacceptable forms revealed that 73.33% and 53.33% of the questions in the items can be considered acceptable, respectively, for ChatGPT and Claude. The use of LLMs to generate SCT items can be helpful for medical educators by reducing the required time and effort. Although the prompt provides a good starting point, it remains crucial to review and revise AI-generated SCT items before educational use. The prompt and the custom GPT, “Script Concordance Test Generator”, available at https://chatgpt.com/g/g-RlzW5xdc1-script-concordance-test-generator, can streamline SCT item development.
Downloads
Metrics
Referências
Daniel M, Rencic J, Durning SJ, Holmboe E, Santen SA, Lang V, et al. Clinical Reasoning Assessment Methods: A Scoping Review and Practical Guidance. Acad Med. 2019 Jun; 94(6):902–12. https://doi.org/10.1097/acm.0000000000002618
Fournier JP, Demeester A, Charlin B. Script Concordance Tests: Guidelines for Construction. BMC Med Inform Decis Mak. 2008 Dec;8(1):18. https://doi.org/10.1186/1472-6947-8-18
Lubarsky S, Dory V, Duggan P, Gagnon R, Charlin B. Script concordance testing: From theory to practice: AMEE Guide No. 75. Med Teach. 2013 Mar;35(3):184–93. https://doi.org/10.3109/0142159x.2013.760036
Lubarsky S, Charlin B, Cook DA, Chalk C, Van Der Vleuten CPM. Script concordance testing: a review of published validity evidence: Validity evidence for script concordance tests. Med Educ. 2011 Apr;45(4):329–38. https://doi.org/10.1111/j.1365-2923.2010.03863.x
Gordon M, Daniel M, Ajiboye A, Uraiby H, Xu NY, Bartlett R, et al. A scoping review of artificial intelligence in medical education: BEME Guide No. 84. Med Teach. 2024 Apr 2;46(4):446–70. https://doi.org/10.1080/0142159x.2024.2314198
Bakkum MJ, Hartjes MG, Piët JD, Donker EM, Likic R, Sanz E, et al. Using artificial intelligence to create diverse and inclusive medical case vignettes for education. Brit J Clinical Pharma. 2024 Jan 6;90(3):640–8. https://doi.org/10.1111/bcp.15977
Coşkun Ö, Kıyak YS, Budakoğlu Iİ. ChatGPT to generate clinical vignettes for teaching and multiple-choice questions for assessment: A randomized controlled experiment. Med Teach. 2024 Mar 13; https://doi.org/10.1080/0142159x.2024.2327477
Cook DA. Creating virtual patients using large language models: scalable, global, and low cost. Med Teach. 2024 Jul 11; https://doi.org/10.1080/0142159x.2024.2376879
Lam G, Shammoon Y, Coulson A, Lalloo F, Maini A, Amin A, et al. Utility of large language models for creating clinical assessment items. Med Teach. 2024 Aug 26;1–5. https://doi.org/10.1080/0142159x.2024.2382860
Kıyak YS, Emekli E. ChatGPT prompts for generating multiple-choice questions in medical education and evidence on their validity: a literature review. Postgrad Med J. 2024 Jun 6; https://doi.org/10.1093/postmj/qgae065
Mistry NP, Saeed H, Rafique S, Le T, Obaid H, Adams SJ. Large Language Models as Tools to Generate Radiology Board-Style Multiple-Choice Questions. Acad Radiol. 2024 Jul;S107663322400432X. https://doi.org/10.1016/j.acra.2024.06.046
Hudon A, Kiepura B, Pelletier M, Phan V. Using ChatGPT in Psychiatry to Design Script Concordance Tests in Undergraduate Medical Education: Mixed Methods Study. JMIR Med Educ. 2024 Apr 4;10:e54067–e54067. https://doi.org/10.2196/54067
Kıyak YS, Emekli E. A Prompt for Generating Script Concordance Test Using ChatGPT, Claude, and Llama Large Language Model Chatbots. Revista Española de Educación Médica. 2024;5(3):1–8. https://doi.org/10.6018/edumed.612381
Masters K. Medical Teacher’s first ChatGPT’s referencing hallucinations: Lessons for editors, reviewers, and teachers. Med Teach. 2023 Jul;45(7):673–5. https://doi.org/10.1080/0142159x.2023.2208731
Al-Naser Y, Halka F, Ng B, Mountford D, Sharma S, Niure K, et al. Evaluating Artificial Intelligence Competency in Education: Performance of ChatGPT-4 in the American Registry of Radiologic Technologists (ARRT) Radiography Certification Exam. Academic Radiology. 2024 Aug;S1076633224005725. https://doi.org/10.1016/j.acra.2024.08.009
Masters K, Benjamin J, Agrawal A, MacNeill H, Pillow MT, Mehta N. Twelve tips on creating and using custom GPTs to enhance health professions education. Med Teach. 2024 Jan 29;46(6):752–6. https://doi.org/10.1080/0142159x.2024.2305365
Kıyak YS. Beginner-Level Tips for Medical Educators: Guidance on Selection, Prompt Engineering, and the Use of Artificial Intelligence Chatbots. Med Sci Educ. 2024 Aug 17; https://doi.org/10.1007/s40670-024-02146-1
Li J, Wang S, Zhang M, Li W, Lai Y, Kang X, et al. Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents [Internet]. arXiv; 2024 [cited 2024 May 10]. Available from: http://arxiv.org/abs/2405.029571
Direitos de Autor (c) 2024 Serviço de Publicações da Universidade de Múrcia
Este trabalho encontra-se publicado com a Licença Internacional Creative Commons Atribuição-NãoComercial-SemDerivações 4.0.
Os trabalhos publicados nesta revista estão sujeitos aos seguintes termos:
1. O Serviço de Publicações da Universidade de Murcia (o editor) preserva os direitos económicos (direitos de autor) das obras publicadas e favorece e permite a sua reutilização ao abrigo da licença de utilização indicada no ponto 2.
2. Os trabalhos são publicados sob uma licença Creative Commons Atribuição-NãoComercial-NãoDerivada 4.0.
3. Condições de autoarquivamento. Os autores estão autorizados e incentivados a divulgar eletronicamente as versões pré-impressas (versão antes de ser avaliada e enviada à revista) e / ou pós-impressas (versão avaliada e aceita para publicação) de seus trabalhos antes da publicação, desde que favorece sua circulação e difusão mais precoce e com ela possível aumento de sua citação e alcance junto à comunidade acadêmica.