Is GPT-4 capable of passing MIR 2023? Comparison between GPT-4 and ChatGPT-3 in the MIR 2022 and 2023 exams
Supporting Agencies
- No ha habido financiación
Abstract
Introduction: Artificial intelligence (AI) is generating new controversies, opportunities and challenges in medical education. This study evaluates the ability of artificial intelligence (AI) versions ChatGPT-3 and GPT-4 to answer MIR exam questions of the entrance exam in the specialized training in Spain, comparing performance between the 2022 and 2023 exams.
Methodology: A descriptive cross-sectional study was conducted, using GPT-4 to answer the 210 questions of the MIR 2023 exam, comparing the results with those of ChatGPT-3 in the MIR 2022 exam. Statistical analysis was used to determine the percentage of correct answers according to speciality, type of question, and question content.
Results: GPT-4 achieved 173 correct answers out of 210 questions, a higher performance than ChatGPT-3, which obtained 108 correct answers in the previous exam. A marked improvement was observed in specialties such as Rheumatology, Paediatrics, Geriatrics and Oncology, although some fields such as Pneumology and Ophthalmology showed less progress or even lower results.
Conclusion: GPT-4 demonstrated better performance compared to ChatGPT-3, indicating advances in AI data processing and analysis, as well as in its contextual understanding and application of medical knowledge. However, the article emphasizes the importance of recognising the limitations of AI and the need for a critical approach in medical education.
Downloads
Metrics
References
Arif TB, Munaf U, Ul-Haque I. The future of medical education and research: Is ChatGPT a blessing or blight in disguise? Med Educ Online. 2023 Dec;28(1):2181052. doi: 10.1080/10872981.2023.2181052. PMID: 36809073; PMCID: PMC9946299.
OpenAI. Introducing ChatGPT. Disponible en: https://openai.com/blog/chatgpt/
Temsah O, Khan SA, Chaiah Y, Senjab A, Alhasan K, Jamal A, Aljamaan F, Malki KH, Halwani R, Al-Tawfiq JA, Temsah MH, Al-Eyadhy A. Overview of Early ChatGPT's Presence in Medical Literature: Insights From a Hybrid Literature Review by ChatGPT and Human Experts. Cureus. 2023 Apr 8;15(4):e37281. doi: 10.7759/cureus.37281. PMID: 37038381; PMCID: PMC10082551.
Mohammad B, Supti T, Alzubaidi M, Shah H, Alam T, Shah Z, Househ M. The Pros and Cons of Using ChatGPT in Medical Education: A Scoping Review. Stud Health Technol Inform. 2023 Jun 29;305:644-647. doi: 10.3233/SHTI230580. PMID: 37387114.
Dergaa I, Chamari K, Zmijewski P, Ben Saad H. From human writing to artificial intelligence generated text: examining the prospects and potential threats of ChatGPT in academic writing. Biol Sport. 2023 Apr;40(2):615-622. doi: 10.5114/biolsport.2023.125623. Epub 2023 Mar 15. PMID: 37077800; PMCID: PMC10108763.
Davidson T, Bhattacharya D, Weber I. Racial Bias in Hate Speech and Abusive Language Detection Datasets. In Proceedings of the Third Workshop on Abusive Language Online. 2019, pages 25–35, Florence, Italy. Association for Computational Linguistics.
Hadas K, Rikker D, David D Gender bias and stereotypes in Large Language Models. In Proceedings of The ACM Collective Intelligence Conference (CI '23). Association for Computing Machinery, New York, NY, USA. 2023. 12–24. doi:10.1145/3582269.3615599.
Jeyaraman M, Ramasubramanian S, Balaji S, Jeyaraman N, Nallakumarasamy A, Sharma S. ChatGPT in action: Harnessing artificial intelligence potential and addressing ethical challenges in medicine, education, and scientific research. World J Methodol. 2023 Sep 20;13(4):170-178. doi: 10.5662/wjm.v13.i4.170. PMID: 37771867; PMCID: PMC10523250.
Sahu PK, Benjamin LA, Singh Aswal G, Williams-Persad A. ChatGPT in research and health professions education: challenges, opportunities, and future directions. Postgrad Med J. 2023 Dec 21;100(1179):50-55. doi: 10.1093/postmj/qgad090. PMID: 37819738.
Scaioli G, Lo Moro G, Conrado F, Rosset L, Bert F, Siliquini R. Exploring the potential of ChatGPT for clinical reasoning and decision-making: a cross-sectional study on the Italian Medical Residency Exam. Ann Ist Super Sanita. 2023 Oct-Dec;59(4):267-270. doi: 10.4415/ANN_23_04_05. PMID: 38088393.
Carrasco JP, García E, Sánchez DA, Estrella-Porter P, Puente LDL, Navarro J, et al. ¿Es capaz “ChatGPT” de aprobar el examen MIR de 2022? Implicaciones de la inteligencia artificial en la educación médica en España. Rev Esp Educ Médica [Internet]. 2023 Feb 16 [cited 2024 Feb 1];4(1). Available from: https://revistas.um.es/edumed/article/view/556511
Madrid-García, A., Rosales-Rosado, Z., Freites-Nuñez, D. et al. Harnessing ChatGPT and GPT-4 for evaluating the rheumatology questions of the Spanish access exam to specialized medical training. Sci Rep 13, 22129 (2023). https://doi.org/10.1038/s41598-023-49483-6
Guillen-Grima F, Guillen-Aguinaga S, Guillen-Aguinaga L, Alas-Brun R, Onambele L, Ortega W, Montejo R, Aguinaga-Ontoso E, Barach P, Aguinaga-Ontoso I. Evaluating the Efficacy of ChatGPT in Navigating the Spanish Medical Residency Entrance Examination (MIR): Promising Horizons for AI in Clinical Medicine. Clinics and Practice. 2023; 13(6):1460-1487. https://doi.org/10.3390/clinpract13060130
Mihalache A, Huang RS, Popovic MM, Muni RH. GPT-4: An assessment of an upgraded artificial intelligence chatbot in the United States Medical Licensing Examination. Med Teach. 2023 Oct 15:1-7. doi: 10.1080/0142159X.2023.2249588. Epub ahead of print. PMID: 37839017.
Takagi S, Watari T, Erabi A, Sakaguchi K Performance of ChatGPT.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study JMIR Med Educ 2023;9:e48002 doi: 10.2196/48002 PMID: 37384388 PMCID: 10365615
Wang H, Wu W, Dou Z, He L, Yang L. Performance and exploration of ChatGPT in medical examination, records and education in Chinese: Pave the way for medical AI. Int J Med Inform. 2023 Sep;177:105173. doi: 10.1016/j.ijmedinf.2023.105173. Epub 2023 Aug 4. PMID: 37549499.
Jung LB, Gudera JA, Wiegand TLT, Allmendinger S, Dimitriadis K, Koerte IK. ChatGPT Passes German State Examination in Medicine With Picture Questions Omitted. Dtsch Arztebl Int. 2023 May 30;120(21):373-374. doi: 10.3238/arztebl.m2023.0113. PMID: 37530052; PMCID: PMC10413971.
Alessandri Bonetti M, Giorgino R, Gallo Afflitto G, De Lorenzi F, Egro FM. How Does ChatGPT Perform on the Italian Residency Admission National Exam Compared to 15,869 Medical Graduates? Ann Biomed Eng. 2023 Jul 25. doi: 10.1007/s10439-023-03318-7. Epub ahead of print. PMID: 37490183.
Kung JE, Marshall C, Gauthier C, Gonzalez TA, Jackson JB. Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination. JBJS Open Access 8(3):e23.00056, July-September 2023. doi:10.2106/JBJS.OA.23.00056
Kufel J, Paszkiewicz I, Bielówka M, Bartnikowska W, Janik M, Stencel M, Czogalik Ł, Gruszczyńska K, Mielcarska S. Will ChatGPT pass the Polish specialty exam in radiology and diagnostic imaging? Insights into strengths and limitations. Pol J Radiol. 2023 Sep 18;88:e430-e434. doi: 10.5114/pjr.2023.131215. PMID: 37808173; PMCID: PMC10551734.
Mehrabanian M, Zariat Y. ChatGPT passes anatomy exam. Br Dent J. 2023 Sep;235(5):295. doi: 10.1038/s41415-023-6286-7. Epub 2023 Sep 8. PMID: 37684439.
Copyright (c) 2024 Servicio de Publicaciones de la Universidad de Murcia
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The works published in this magazine are subject to the following terms:
1. The Publications Service of the University of Murcia (the publisher) preserves the economic rights (copyright) of the published works and favors and allows them to be reused under the use license indicated in point 2.
2. The works are published under a Creative Commons Attribution-NonCommercial-NoDerivative 4.0 license.
3. Self-archiving conditions. Authors are allowed and encouraged to disseminate electronically the pre-print versions (version before being evaluated and sent to the journal) and / or post-print (version evaluated and accepted for publication) of their works before publication , since it favors its circulation and earlier diffusion and with it a possible increase in its citation and reach among the academic community.