¿Es capaz GPT-4 de aprobar el MIR 2023? Comparativa entre GPT-4 y ChatGPT-3 en los exámenes MIR 2022 y 2023

Alvaro Cerame; Juan Juaneda; Pablo Estrella-Porter; Lucía de la Puente; Joaquín Navarro; Eva García; Domingo A. Sánchez; Juan Pablo Carrasco

doi:10.6018/edumed.604091

Authors

Alvaro Cerame Plan de Atención Integral al Profesional Sanitario Enfermo, Servicio Madrileño de Salud, Madrid https://orcid.org/0000-0003-0469-8461
Juan Juaneda Servicio de Medicina Preventiva y Salud Pública, Hospital Universitari i Politècnic La Fe, Valencia https://orcid.org/0000-0002-6048-2457
Pablo Estrella-Porter Servicio de Medicina Preventiva, Hospital Clínico Universitario de Valencia, Valencia https://orcid.org/0000-0003-4137-7691
Lucía de la Puente Departamento de Atención Primaria, Hospital Universitari i Politècnic La Fe, Valencia https://orcid.org/0009-0007-3263-5691
Joaquín Navarro Servicio de Cuidados Intensivos, Área de Gestión Sanitaria Norte de Huelva, Huelva https://orcid.org/0000-0002-7983-7289
Eva García Servicio de Cardiología, Complejo Hospitalario Universitario Toledo, Toledo https://orcid.org/0000-0001-8962-6023
Domingo A. Sánchez Servicio de Oncología Médica Hospital Universitario Morales Meseguer, Grupo de Oncología Clínica y Translacional IMIB-Arrixaca, Murcia https://orcid.org/0000-0003-2073-0679
Juan Pablo Carrasco Servicio de Psiquiatría, Hospital Provincial de Castellón, Castellón https://orcid.org/0000-0001-9137-7775

DOI: https://doi.org/10.6018/edumed.604091

Keywords: Artificial Intelliegence, AI, ChatGPT-3, GPT, Medical Education, Specialization exam, Resident Doctor

Supporting Agencies

No ha habido financiación

Abstract

Introduction: Artificial intelligence (AI) is generating new controversies, opportunities and challenges in medical education. This study evaluates the ability of artificial intelligence (AI) versions ChatGPT-3 and GPT-4 to answer MIR exam questions of the entrance exam in the specialized training in Spain, comparing performance between the 2022 and 2023 exams.

Methodology: A descriptive cross-sectional study was conducted, using GPT-4 to answer the 210 questions of the MIR 2023 exam, comparing the results with those of ChatGPT-3 in the MIR 2022 exam. Statistical analysis was used to determine the percentage of correct answers according to speciality, type of question, and question content.

Results: GPT-4 achieved 173 correct answers out of 210 questions, a higher performance than ChatGPT-3, which obtained 108 correct answers in the previous exam. A marked improvement was observed in specialties such as Rheumatology, Paediatrics, Geriatrics and Oncology, although some fields such as Pneumology and Ophthalmology showed less progress or even lower results.

Conclusion: GPT-4 demonstrated better performance compared to ChatGPT-3, indicating advances in AI data processing and analysis, as well as in its contextual understanding and application of medical knowledge. However, the article emphasizes the importance of recognising the limitations of AI and the need for a critical approach in medical education.

Downloads

Download data is not yet available.

Metrics

Views/Downloads

Abstract
1636
pdf (Español (España))
719
pdf
719

References

Arif TB, Munaf U, Ul-Haque I. The future of medical education and research: Is ChatGPT a blessing or blight in disguise? Med Educ Online. 2023 Dec;28(1):2181052. doi: 10.1080/10872981.2023.2181052. PMID: 36809073; PMCID: PMC9946299.

OpenAI. Introducing ChatGPT. Disponible en: https://openai.com/blog/chatgpt/

Temsah O, Khan SA, Chaiah Y, Senjab A, Alhasan K, Jamal A, Aljamaan F, Malki KH, Halwani R, Al-Tawfiq JA, Temsah MH, Al-Eyadhy A. Overview of Early ChatGPT's Presence in Medical Literature: Insights From a Hybrid Literature Review by ChatGPT and Human Experts. Cureus. 2023 Apr 8;15(4):e37281. doi: 10.7759/cureus.37281. PMID: 37038381; PMCID: PMC10082551.

Mohammad B, Supti T, Alzubaidi M, Shah H, Alam T, Shah Z, Househ M. The Pros and Cons of Using ChatGPT in Medical Education: A Scoping Review. Stud Health Technol Inform. 2023 Jun 29;305:644-647. doi: 10.3233/SHTI230580. PMID: 37387114.

Dergaa I, Chamari K, Zmijewski P, Ben Saad H. From human writing to artificial intelligence generated text: examining the prospects and potential threats of ChatGPT in academic writing. Biol Sport. 2023 Apr;40(2):615-622. doi: 10.5114/biolsport.2023.125623. Epub 2023 Mar 15. PMID: 37077800; PMCID: PMC10108763.

Davidson T, Bhattacharya D, Weber I. Racial Bias in Hate Speech and Abusive Language Detection Datasets. In Proceedings of the Third Workshop on Abusive Language Online. 2019, pages 25–35, Florence, Italy. Association for Computational Linguistics.

Hadas K, Rikker D, David D Gender bias and stereotypes in Large Language Models. In Proceedings of The ACM Collective Intelligence Conference (CI '23). Association for Computing Machinery, New York, NY, USA. 2023. 12–24. doi:10.1145/3582269.3615599.

Jeyaraman M, Ramasubramanian S, Balaji S, Jeyaraman N, Nallakumarasamy A, Sharma S. ChatGPT in action: Harnessing artificial intelligence potential and addressing ethical challenges in medicine, education, and scientific research. World J Methodol. 2023 Sep 20;13(4):170-178. doi: 10.5662/wjm.v13.i4.170. PMID: 37771867; PMCID: PMC10523250.

Sahu PK, Benjamin LA, Singh Aswal G, Williams-Persad A. ChatGPT in research and health professions education: challenges, opportunities, and future directions. Postgrad Med J. 2023 Dec 21;100(1179):50-55. doi: 10.1093/postmj/qgad090. PMID: 37819738.

Scaioli G, Lo Moro G, Conrado F, Rosset L, Bert F, Siliquini R. Exploring the potential of ChatGPT for clinical reasoning and decision-making: a cross-sectional study on the Italian Medical Residency Exam. Ann Ist Super Sanita. 2023 Oct-Dec;59(4):267-270. doi: 10.4415/ANN_23_04_05. PMID: 38088393.

Carrasco JP, García E, Sánchez DA, Estrella-Porter P, Puente LDL, Navarro J, et al. ¿Es capaz “ChatGPT” de aprobar el examen MIR de 2022? Implicaciones de la inteligencia artificial en la educación médica en España. Rev Esp Educ Médica [Internet]. 2023 Feb 16 [cited 2024 Feb 1];4(1). Available from: https://revistas.um.es/edumed/article/view/556511

Madrid-García, A., Rosales-Rosado, Z., Freites-Nuñez, D. et al. Harnessing ChatGPT and GPT-4 for evaluating the rheumatology questions of the Spanish access exam to specialized medical training. Sci Rep 13, 22129 (2023). https://doi.org/10.1038/s41598-023-49483-6

Guillen-Grima F, Guillen-Aguinaga S, Guillen-Aguinaga L, Alas-Brun R, Onambele L, Ortega W, Montejo R, Aguinaga-Ontoso E, Barach P, Aguinaga-Ontoso I. Evaluating the Efficacy of ChatGPT in Navigating the Spanish Medical Residency Entrance Examination (MIR): Promising Horizons for AI in Clinical Medicine. Clinics and Practice. 2023; 13(6):1460-1487. https://doi.org/10.3390/clinpract13060130

Mihalache A, Huang RS, Popovic MM, Muni RH. GPT-4: An assessment of an upgraded artificial intelligence chatbot in the United States Medical Licensing Examination. Med Teach. 2023 Oct 15:1-7. doi: 10.1080/0142159X.2023.2249588. Epub ahead of print. PMID: 37839017.

Takagi S, Watari T, Erabi A, Sakaguchi K Performance of ChatGPT.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study JMIR Med Educ 2023;9:e48002 doi: 10.2196/48002 PMID: 37384388 PMCID: 10365615

Wang H, Wu W, Dou Z, He L, Yang L. Performance and exploration of ChatGPT in medical examination, records and education in Chinese: Pave the way for medical AI. Int J Med Inform. 2023 Sep;177:105173. doi: 10.1016/j.ijmedinf.2023.105173. Epub 2023 Aug 4. PMID: 37549499.

Jung LB, Gudera JA, Wiegand TLT, Allmendinger S, Dimitriadis K, Koerte IK. ChatGPT Passes German State Examination in Medicine With Picture Questions Omitted. Dtsch Arztebl Int. 2023 May 30;120(21):373-374. doi: 10.3238/arztebl.m2023.0113. PMID: 37530052; PMCID: PMC10413971.

Alessandri Bonetti M, Giorgino R, Gallo Afflitto G, De Lorenzi F, Egro FM. How Does ChatGPT Perform on the Italian Residency Admission National Exam Compared to 15,869 Medical Graduates? Ann Biomed Eng. 2023 Jul 25. doi: 10.1007/s10439-023-03318-7. Epub ahead of print. PMID: 37490183.

Kung JE, Marshall C, Gauthier C, Gonzalez TA, Jackson JB. Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination. JBJS Open Access 8(3):e23.00056, July-September 2023. doi:10.2106/JBJS.OA.23.00056

Kufel J, Paszkiewicz I, Bielówka M, Bartnikowska W, Janik M, Stencel M, Czogalik Ł, Gruszczyńska K, Mielcarska S. Will ChatGPT pass the Polish specialty exam in radiology and diagnostic imaging? Insights into strengths and limitations. Pol J Radiol. 2023 Sep 18;88:e430-e434. doi: 10.5114/pjr.2023.131215. PMID: 37808173; PMCID: PMC10551734.

Mehrabanian M, Zariat Y. ChatGPT passes anatomy exam. Br Dent J. 2023 Sep;235(5):295. doi: 10.1038/s41415-023-6286-7. Epub 2023 Sep 8. PMID: 37684439.