Using Large Language Models to Generate Script Concordance Test in Medical Education: ChatGPT and Claude

Autores

  • Yavuz Selim Kıyak Department of Medical Education and Informatics, Gazi University Faculty of Medicine, Ankara, Turkey https://orcid.org/0000-0002-5026-3234
  • Emre Emekli Department of Radiology, Faculty of Medicine, Eskişehir Osmangazi University, Eskişehir, Turkiye
DOI: https://doi.org/10.6018/edumed.636331
Palavras-chave: script concordance test, clinical reasoning, medical education, artificial intelligence, ChatGPT

Resumo

We aimed to determine the quality of AI-generated (ChatGPT-4 and Claude 3) Script Concordance Test (SCT) items through an expert panel. We generated SCT items on abdominal radiology using a complex prompt in large language model (LLM) chatbots (ChatGPT-4 and Claude 3 (Sonnet) in April 2024) and evaluated the items’ quality through an expert panel of 16 radiologists. Expert panel, which was blind to the origin of the items provided without modifications, independently answered each item and assessed them using 12 quality indicators. Data analysis included descriptive statistics, bar charts to compare responses against accepted forms, and a heatmap to show performance in terms of the quality indicators. SCT items generated by chatbots assess clinical reasoning rather than only factual recall (ChatGPT: 92.50%, Claude: 85.00%). The heatmap indicated that the items were generally acceptable, with most responses favorable across quality indicators (ChatGPT: 71.77%, Claude: 64.23%). The comparison of the bar charts with acceptable and unacceptable forms revealed that 73.33% and 53.33% of the questions in the items can be considered acceptable, respectively, for ChatGPT and Claude. The use of LLMs to generate SCT items can be helpful for medical educators by reducing the required time and effort. Although the prompt provides a good starting point, it remains crucial to review and revise AI-generated SCT items before educational use. The prompt and the custom GPT, “Script Concordance Test Generator”, available at https://chatgpt.com/g/g-RlzW5xdc1-script-concordance-test-generator, can streamline SCT item development.

Downloads

Não há dados estatísticos.

Metrics

Metrics Loading ...

Referências

Daniel M, Rencic J, Durning SJ, Holmboe E, Santen SA, Lang V, et al. Clinical Reasoning Assessment Methods: A Scoping Review and Practical Guidance. Acad Med. 2019 Jun; 94(6):902–12. https://doi.org/10.1097/acm.0000000000002618

Fournier JP, Demeester A, Charlin B. Script Concordance Tests: Guidelines for Construction. BMC Med Inform Decis Mak. 2008 Dec;8(1):18. https://doi.org/10.1186/1472-6947-8-18

Lubarsky S, Dory V, Duggan P, Gagnon R, Charlin B. Script concordance testing: From theory to practice: AMEE Guide No. 75. Med Teach. 2013 Mar;35(3):184–93. https://doi.org/10.3109/0142159x.2013.760036

Lubarsky S, Charlin B, Cook DA, Chalk C, Van Der Vleuten CPM. Script concordance testing: a review of published validity evidence: Validity evidence for script concordance tests. Med Educ. 2011 Apr;45(4):329–38. https://doi.org/10.1111/j.1365-2923.2010.03863.x

Gordon M, Daniel M, Ajiboye A, Uraiby H, Xu NY, Bartlett R, et al. A scoping review of artificial intelligence in medical education: BEME Guide No. 84. Med Teach. 2024 Apr 2;46(4):446–70. https://doi.org/10.1080/0142159x.2024.2314198

Bakkum MJ, Hartjes MG, Piët JD, Donker EM, Likic R, Sanz E, et al. Using artificial intelligence to create diverse and inclusive medical case vignettes for education. Brit J Clinical Pharma. 2024 Jan 6;90(3):640–8. https://doi.org/10.1111/bcp.15977

Coşkun Ö, Kıyak YS, Budakoğlu Iİ. ChatGPT to generate clinical vignettes for teaching and multiple-choice questions for assessment: A randomized controlled experiment. Med Teach. 2024 Mar 13; https://doi.org/10.1080/0142159x.2024.2327477

Cook DA. Creating virtual patients using large language models: scalable, global, and low cost. Med Teach. 2024 Jul 11; https://doi.org/10.1080/0142159x.2024.2376879

Lam G, Shammoon Y, Coulson A, Lalloo F, Maini A, Amin A, et al. Utility of large language models for creating clinical assessment items. Med Teach. 2024 Aug 26;1–5. https://doi.org/10.1080/0142159x.2024.2382860

Kıyak YS, Emekli E. ChatGPT prompts for generating multiple-choice questions in medical education and evidence on their validity: a literature review. Postgrad Med J. 2024 Jun 6; https://doi.org/10.1093/postmj/qgae065

Mistry NP, Saeed H, Rafique S, Le T, Obaid H, Adams SJ. Large Language Models as Tools to Generate Radiology Board-Style Multiple-Choice Questions. Acad Radiol. 2024 Jul;S107663322400432X. https://doi.org/10.1016/j.acra.2024.06.046

Hudon A, Kiepura B, Pelletier M, Phan V. Using ChatGPT in Psychiatry to Design Script Concordance Tests in Undergraduate Medical Education: Mixed Methods Study. JMIR Med Educ. 2024 Apr 4;10:e54067–e54067. https://doi.org/10.2196/54067

Kıyak YS, Emekli E. A Prompt for Generating Script Concordance Test Using ChatGPT, Claude, and Llama Large Language Model Chatbots. Revista Española de Educación Médica. 2024;5(3):1–8. https://doi.org/10.6018/edumed.612381

Masters K. Medical Teacher’s first ChatGPT’s referencing hallucinations: Lessons for editors, reviewers, and teachers. Med Teach. 2023 Jul;45(7):673–5. https://doi.org/10.1080/0142159x.2023.2208731

Al-Naser Y, Halka F, Ng B, Mountford D, Sharma S, Niure K, et al. Evaluating Artificial Intelligence Competency in Education: Performance of ChatGPT-4 in the American Registry of Radiologic Technologists (ARRT) Radiography Certification Exam. Academic Radiology. 2024 Aug;S1076633224005725. https://doi.org/10.1016/j.acra.2024.08.009

Masters K, Benjamin J, Agrawal A, MacNeill H, Pillow MT, Mehta N. Twelve tips on creating and using custom GPTs to enhance health professions education. Med Teach. 2024 Jan 29;46(6):752–6. https://doi.org/10.1080/0142159x.2024.2305365

Kıyak YS. Beginner-Level Tips for Medical Educators: Guidance on Selection, Prompt Engineering, and the Use of Artificial Intelligence Chatbots. Med Sci Educ. 2024 Aug 17; https://doi.org/10.1007/s40670-024-02146-1

Li J, Wang S, Zhang M, Li W, Lai Y, Kang X, et al. Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents [Internet]. arXiv; 2024 [cited 2024 May 10]. Available from: http://arxiv.org/abs/2405.029571

Publicado
03-12-2024
Como Citar
Kıyak, Y. S., & Emekli, E. (2024). Using Large Language Models to Generate Script Concordance Test in Medical Education: ChatGPT and Claude. Revista Espanhola de Educação Médica, 6(1). https://doi.org/10.6018/edumed.636331

Publication Facts

Metric
This article
Other articles
Peer reviewers 
3
2,4

Reviewer profiles  Indisp.

Author statements

Author statements
This article
Other articles
Data availability 
N/A
16%
External funding 
Indisp.
32%
Competing interests 
Indisp.
11%
Metric
This journal
Other journals
Articles accepted 
85%
33%
Days to publication 
28
145

Indexed in

Editor & editorial board
profiles
Academic society 
Universidad de Murcia
Publisher 
Ediciones de la Universidad de Murcia (Editum)