¿Es capaz GPT-4 de aprobar el MIR 2023? Comparativa entre GPT-4 y ChatGPT-3 en los exámenes MIR 2022 y 2023

Alvaro Cerame; Juan Juaneda; Pablo Estrella-Porter; Lucía de la Puente; Joaquín Navarro; Eva García; Domingo A. Sánchez; Juan Pablo Carrasco

doi:10.6018/edumed.604091

Autores/as

Alvaro Cerame Plan de Atención Integral al Profesional Sanitario Enfermo, Servicio Madrileño de Salud, Madrid https://orcid.org/0000-0003-0469-8461
Juan Juaneda Servicio de Medicina Preventiva y Salud Pública, Hospital Universitari i Politècnic La Fe, Valencia https://orcid.org/0000-0002-6048-2457
Pablo Estrella-Porter Servicio de Medicina Preventiva, Hospital Clínico Universitario de Valencia, Valencia https://orcid.org/0000-0003-4137-7691
Lucía de la Puente Departamento de Atención Primaria, Hospital Universitari i Politècnic La Fe, Valencia https://orcid.org/0009-0007-3263-5691
Joaquín Navarro Servicio de Cuidados Intensivos, Área de Gestión Sanitaria Norte de Huelva, Huelva https://orcid.org/0000-0002-7983-7289
Eva García Servicio de Cardiología, Complejo Hospitalario Universitario Toledo, Toledo https://orcid.org/0000-0001-8962-6023
Domingo A. Sánchez Servicio de Oncología Médica Hospital Universitario Morales Meseguer, Grupo de Oncología Clínica y Translacional IMIB-Arrixaca, Murcia https://orcid.org/0000-0003-2073-0679
Juan Pablo Carrasco Servicio de Psiquiatría, Hospital Provincial de Castellón, Castellón https://orcid.org/0000-0001-9137-7775

DOI: https://doi.org/10.6018/edumed.604091

Palabras clave: Inteligencia Artificial, ChatGPT-3, GPT4, Educación Médica, IA, MIR, Médico Residente

Agencias de apoyo

No ha habido financiación

Resumen

Introducción: La inteligencia artificial (IA) está generando nuevas controversias, oportunidades y riesgos en la educación médica. Este estudio evalúa la capacidad de las versiones de inteligencia artificial (IA) ChatGPT-3 y GPT-4 para responder a las preguntas del examen de acceso a la formación médica especializada MIR en España, comparando el rendimiento entre las convocatorias de 2022 y 2023.

Metodología: Se realizó un estudio descriptivo transversal, utilizando GPT-4 para responder a las 210 preguntas del examen MIR 2023, comparando los resultados con los de ChatGPT-3 en el examen MIR 2022. Se utilizó análisis estadístico para determinar el porcentaje de acierto en función de la especialidad, tipo de pregunta y contenido de la misma.

Resultados: GPT-4 consiguió 173 aciertos de un total de 210 preguntas, rendimiento superior al de ChatGPT-3, que obtuvo 108 aciertos en el examen de la convocatoria anterior. Se observó una mejora notable en especialidades como Reumatología, Pediatría, Geriatría y Oncología, aunque algunos campos como Neumología y Oftalmología mostraron menos progreso o incluso resultados inferiores.

Conclusión: GPT-4 demostró un mejor rendimiento en comparación con ChatGPT-3, indicando avances en el procesamiento y análisis de datos por parte de la IA, así como en su comprensión contextual y aplicación de conocimientos médicos. Sin embargo, se enfatiza la importancia de reconocer las limitaciones de la IA y la necesidad de un enfoque crítico en su uso en educación médica.

Descargas

Métricas

Visualizaciones del PDF

543

|

Citas

Arif TB, Munaf U, Ul-Haque I. The future of medical education and research: Is ChatGPT a blessing or blight in disguise? Med Educ Online. 2023 Dec;28(1):2181052. doi: 10.1080/10872981.2023.2181052. PMID: 36809073; PMCID: PMC9946299.

OpenAI. Introducing ChatGPT. Disponible en: https://openai.com/blog/chatgpt/

Temsah O, Khan SA, Chaiah Y, Senjab A, Alhasan K, Jamal A, Aljamaan F, Malki KH, Halwani R, Al-Tawfiq JA, Temsah MH, Al-Eyadhy A. Overview of Early ChatGPT's Presence in Medical Literature: Insights From a Hybrid Literature Review by ChatGPT and Human Experts. Cureus. 2023 Apr 8;15(4):e37281. doi: 10.7759/cureus.37281. PMID: 37038381; PMCID: PMC10082551.

Mohammad B, Supti T, Alzubaidi M, Shah H, Alam T, Shah Z, Househ M. The Pros and Cons of Using ChatGPT in Medical Education: A Scoping Review. Stud Health Technol Inform. 2023 Jun 29;305:644-647. doi: 10.3233/SHTI230580. PMID: 37387114.

Dergaa I, Chamari K, Zmijewski P, Ben Saad H. From human writing to artificial intelligence generated text: examining the prospects and potential threats of ChatGPT in academic writing. Biol Sport. 2023 Apr;40(2):615-622. doi: 10.5114/biolsport.2023.125623. Epub 2023 Mar 15. PMID: 37077800; PMCID: PMC10108763.

Davidson T, Bhattacharya D, Weber I. Racial Bias in Hate Speech and Abusive Language Detection Datasets. In Proceedings of the Third Workshop on Abusive Language Online. 2019, pages 25–35, Florence, Italy. Association for Computational Linguistics.

Hadas K, Rikker D, David D Gender bias and stereotypes in Large Language Models. In Proceedings of The ACM Collective Intelligence Conference (CI '23). Association for Computing Machinery, New York, NY, USA. 2023. 12–24. doi:10.1145/3582269.3615599.

Jeyaraman M, Ramasubramanian S, Balaji S, Jeyaraman N, Nallakumarasamy A, Sharma S. ChatGPT in action: Harnessing artificial intelligence potential and addressing ethical challenges in medicine, education, and scientific research. World J Methodol. 2023 Sep 20;13(4):170-178. doi: 10.5662/wjm.v13.i4.170. PMID: 37771867; PMCID: PMC10523250.

Sahu PK, Benjamin LA, Singh Aswal G, Williams-Persad A. ChatGPT in research and health professions education: challenges, opportunities, and future directions. Postgrad Med J. 2023 Dec 21;100(1179):50-55. doi: 10.1093/postmj/qgad090. PMID: 37819738.

Scaioli G, Lo Moro G, Conrado F, Rosset L, Bert F, Siliquini R. Exploring the potential of ChatGPT for clinical reasoning and decision-making: a cross-sectional study on the Italian Medical Residency Exam. Ann Ist Super Sanita. 2023 Oct-Dec;59(4):267-270. doi: 10.4415/ANN_23_04_05. PMID: 38088393.

Carrasco JP, García E, Sánchez DA, Estrella-Porter P, Puente LDL, Navarro J, et al. ¿Es capaz “ChatGPT” de aprobar el examen MIR de 2022? Implicaciones de la inteligencia artificial en la educación médica en España. Rev Esp Educ Médica [Internet]. 2023 Feb 16 [cited 2024 Feb 1];4(1). Available from: https://revistas.um.es/edumed/article/view/556511

Madrid-García, A., Rosales-Rosado, Z., Freites-Nuñez, D. et al. Harnessing ChatGPT and GPT-4 for evaluating the rheumatology questions of the Spanish access exam to specialized medical training. Sci Rep 13, 22129 (2023). https://doi.org/10.1038/s41598-023-49483-6

Guillen-Grima F, Guillen-Aguinaga S, Guillen-Aguinaga L, Alas-Brun R, Onambele L, Ortega W, Montejo R, Aguinaga-Ontoso E, Barach P, Aguinaga-Ontoso I. Evaluating the Efficacy of ChatGPT in Navigating the Spanish Medical Residency Entrance Examination (MIR): Promising Horizons for AI in Clinical Medicine. Clinics and Practice. 2023; 13(6):1460-1487. https://doi.org/10.3390/clinpract13060130

Mihalache A, Huang RS, Popovic MM, Muni RH. GPT-4: An assessment of an upgraded artificial intelligence chatbot in the United States Medical Licensing Examination. Med Teach. 2023 Oct 15:1-7. doi: 10.1080/0142159X.2023.2249588. Epub ahead of print. PMID: 37839017.

Takagi S, Watari T, Erabi A, Sakaguchi K Performance of ChatGPT.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study JMIR Med Educ 2023;9:e48002 doi: 10.2196/48002 PMID: 37384388 PMCID: 10365615

Wang H, Wu W, Dou Z, He L, Yang L. Performance and exploration of ChatGPT in medical examination, records and education in Chinese: Pave the way for medical AI. Int J Med Inform. 2023 Sep;177:105173. doi: 10.1016/j.ijmedinf.2023.105173. Epub 2023 Aug 4. PMID: 37549499.

Jung LB, Gudera JA, Wiegand TLT, Allmendinger S, Dimitriadis K, Koerte IK. ChatGPT Passes German State Examination in Medicine With Picture Questions Omitted. Dtsch Arztebl Int. 2023 May 30;120(21):373-374. doi: 10.3238/arztebl.m2023.0113. PMID: 37530052; PMCID: PMC10413971.

Alessandri Bonetti M, Giorgino R, Gallo Afflitto G, De Lorenzi F, Egro FM. How Does ChatGPT Perform on the Italian Residency Admission National Exam Compared to 15,869 Medical Graduates? Ann Biomed Eng. 2023 Jul 25. doi: 10.1007/s10439-023-03318-7. Epub ahead of print. PMID: 37490183.

Kung JE, Marshall C, Gauthier C, Gonzalez TA, Jackson JB. Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination. JBJS Open Access 8(3):e23.00056, July-September 2023. doi:10.2106/JBJS.OA.23.00056

Kufel J, Paszkiewicz I, Bielówka M, Bartnikowska W, Janik M, Stencel M, Czogalik Ł, Gruszczyńska K, Mielcarska S. Will ChatGPT pass the Polish specialty exam in radiology and diagnostic imaging? Insights into strengths and limitations. Pol J Radiol. 2023 Sep 18;88:e430-e434. doi: 10.5114/pjr.2023.131215. PMID: 37808173; PMCID: PMC10551734.

Mehrabanian M, Zariat Y. ChatGPT passes anatomy exam. Br Dent J. 2023 Sep;235(5):295. doi: 10.1038/s41415-023-6286-7. Epub 2023 Sep 8. PMID: 37684439.