Evaluating the Performance of DeepSeek 3, Claude Sonnet 4, and Gemini 2.5 in the Chilean Medical Licensing Examination: Observational Study.

Anaís Aracelly Lancellotti Guajardo; Oscar  Jerez Yañez; Vicente Alberto Edgardo Jesus Silva Arroyo; Marcos Jeremías Giovanny  Vera Cartes; Álvaro Andrés Herrera Alcaíno

doi:10.6018/edumed.679731

Autores/as

Anaís Aracelly Lancellotti Guajardo Faculty of Medicine, University of Chile, Santiago, Chile. https://orcid.org/0009-0003-2254-0470
Oscar Jerez Yañez Department of Health Sciences Education, Faculty of Medicine, University of Chile, Santiago, Chile. https://orcid.org/0000-0003-0869-5938
Vicente Alberto Edgardo Jesus Silva Arroyo Faculty of Medicine, University of Chile, Santiago, Chile. https://orcid.org/0009-0001-4182-0115
Marcos Jeremías Giovanny Vera Cartes Faculty of Medicine, University of Chile, Santiago, Chile. https://orcid.org/0009-0009-9156-7419
Álvaro Andrés Herrera Alcaíno Faculty of Medicine, University of Chile, Santiago, Chile. https://orcid.org/0009-0007-4861-2144

DOI: https://doi.org/10.6018/edumed.679731

Palabras clave: IA, MIR chileno, EUNACOM

Resumen

Introducción: La inteligencias artificial y su mejora continua han revolucionado la educación médica, pero su desempeño en contextos evaluativos específicos aún requiere mayor exploración. Métodos: Este estudio evaluó y comparó cualitativamente el desempeño de tres modelos de lenguaje de última generación —Claude Sonnet 4, Gemini 2.5 y DeepSeek 3— en simulaciones del Examen Nacional de Conocimientos Médicos (EUNACOM) en Chile. Se utilizaron tres exámenes simulados con 180 preguntas cada uno, que abarcaban diversas áreas médicas y tipos de preguntas, incluidas las basadas en casos clínicos. Resultados: Los resultados muestran que todos los modelos de IA aprobaron los exámenes de forma consistente, y Claude Sonnet 4 logró el mayor desempeño general (89% de precisión) y la mayor consistencia en todos los intentos. Las preguntas basadas en casos clínicos se respondieron con mayor precisión que las preguntas de conocimiento teórico, lo que destaca la fortaleza de los modelos en el razonamiento clínico contextual. Claude sobresalió en Medicina Interna y Psiquiatría, DeepSeek en Cirugía y Gemini demostró un desempeño equilibrado. Sin embargo, se identificaron deficiencias específicas en áreas como la salud pública y el seguimiento clínico, lo que sugiere la necesidad de realizar ajustes específicos a cada modelo. Conclusión: Los hallazgos respaldan el potencial educativo de estas herramientas, pero también enfatizan la importancia de su uso ético, supervisado y complementario a la formación médica tradicional. Este estudio contribuye a comprender el papel emergente de la inteligencia artificial en las evaluaciones profesionales, así como sus limitaciones y oportunidades en el contexto médico chileno.

Descargas

Los datos de descargas todavía no están disponibles.

Metrics

Vistas/Descargas

Resumen
103
pdf
85

Citas

Heng JJY, Teo DB, Tan LF. The impact of Chat Generative Pre-trained Transformer (ChatGPT) on medical education. Postgrad Med J 2023, 99(1176),1125–1127. https://doi.org/10.1093/postmj/qgad058

Eysenbach G. The role of ChatGPT, generative language models, and artificial intelligence in medical education: a conversation with ChatGPT and a call for papers. JMIR Med Educ 2023, 9, e46885. https://doi.org/10.2196/46885

OpenAI. GPT-4V(ision) system card. In: OpenAI Research. OpenAI 2023. https://openai.com/research/gpt-4v-system-card. Accessed July 20, 2025.

Anthropic. Claude Opus 4. In: Claude Models. Anthropic 2023. https://www.anthropic.com/claude/opus. Accessed July 20, 2025.

Google Cloud. Gemini 2.5 Flash. In: Generative Models Documentation. Google Cloud 2025. https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-5-flash. Accessed July 20, 2025.

DeepSeek. DeepSeek-V3-0324 Release. In: DeepSeek API Docs. DeepSeek 2025. https://api-docs.deepseek.com/news/news250325. Accessed July 20, 2025.

Institute of Knowledge Engineering. Trust and interest in AI applications in the health sector. In: Health with AI. Institute of Knowledge Engineering n.d. https://www.iic.uam.es/lasalud/confianza-e-interes-en-la-aplicacion-de-la-ia-en-el-sector-salud/. Accessed July 20, 2025.

Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health 2023, 2(2), e0000198. https://doi.org/10.1371/journal.pdig.0000198

Guillen-Grima F, Guillen-Aguinaga S, Guillen-Aguinaga L, Alas-Brun R, Onambele L, et al. Evaluating the efficacy of ChatGPT in navigating the Spanish Medical Residency Entrance Examination (MIR): Promising horizons for AI in clinical medicine. Clin Pract 2023, 13, 1460–1487. https://doi.org/10.3390/clinpract13060130

Eunacom. Official Regulations. In: National Medical Knowledge Exam. 2023. https://www.eunacom.cl/reglamentacion/NormativaOficial.pdf. Accessed July 21, 2025.

Chile. Law No. 20.261: Creates a national unified medical knowledge exam, incorporates specified posts into the Senior Public Management System, and amends Law No. 19,664. Diario Oficial de la República de Chile. 2008 Apr 19. https://www.bcn.cl/leychile/navegar?idNorma=270584.

Rojas M, Rojas M, Burgess V, Toro-Pérez J, Salehi S. Exploration of the performance of versions 3.5, 4, and 4 with vision of ChatGPT in the Chilean National Medical Exam: Observational study. JMIR Med Educ 2024, 10, e55048. https://doi.org/10.2196/55048

Guevara DR. 180 EUNACOM-style questions. In: Study material for the medical exam. DR Guevara 2024. https://www.drguevara.cl/material-y-pruebas-gratis/180-preguntas-tipo-eunacom/. Accessed July 21, 2025.

Faculty of Medicine. Official EUNACOM mock exam. In: Academic Portal, University of Chile. University of Chile 2024. https://medicina.uchile.cl/. Accessed July 21, 2025.

EUNACOM. Sample official questions. In: Official website of the National Medical Knowledge Exam. National Health Service 2023. https://www.eunacom.cl/contenidos/muestra.html. Accessed July 21, 2025.

Carrasco JP, García E, Sánchez DA, Porter E, De La Puente L, Navarro J, Cerame A. Is "ChatGPT" capable of passing the 2022 MIR exam? Implications of artificial intelligence in medical education in Spain. Revista Española de Educación Médica, 2024, 4(1). https://doi.org/10.6018/edumed.556511

Gaspar Casal Foundation. Clinical decisions and artificial intelligence. In: Publications on health innovation. Gaspar Casal Foundation 2020. https://fundaciongasparcasal.org/wp-content/uploads/2020/12/Decisiones-clinicas-e-inteligencia-artificial.pdf. Accessed July 21, 2025.

Masters K, MacNeil H, Benjamin J, Carver T, Nemethy K, Valanci-Aroesty S, et al. Artificial intelligence in health professions education assessment: AMEE Guide No. 178. Med Teach. 2025, 47(9), 1410-1424. doi:10.1080/0142159X.2024.2445037.

World Health Organization. Ethics and Governance of Artificial Intelligence for Health: Large Multi-Modal Models. WHO Guidance. World Health Organization, 18 Jan. 2024, www.who.int/publications/i/item/9789240084759 . Accessed October 6, 2025.

Chile. Law No. 21.719: Regulates the protection and processing of personal data and creates the Data Protection Agency. Official Gazette of the Republic of Chile. 2024 Dec 13. Available from: https://www.bcn.cl/leychile/navegar?idNorma=1209272. Accessed October 6, 2025.

Chamber of Deputies of Chile. Bill regulating artificial intelligence systems [Docket No. 16.821-19]. Valparaíso; 2024 May 7. Available from: https://www.camara.cl/legislacion/ProyectosDeLey/tramitacion.aspx?prmBOLETIN=16821&prmID=17429. Accessed October 6, 2025.

Miao F, Holmes W. Guidance for generative AI in education and research. Paris, France: UNESCO; 2023. https://unesdoc.unesco.org/ark:/48223/pf0000386693. Accessed October 6, 2025.