Predictive models of academic risk in computing careers with educational data mining


Keywords: educational data mining, academic risk, higher education


The problems of poor academic performance and lag are recurrent in higher-level educational institutions, especially at the beginning of university studies. The early detection of academic risk conditions enables the implementation of educational intervention measures to address factors of poor school performance, associated with the particular contexts of the students. The purpose of this study was to generate predictive models of academic risk, using educational data mining methods, specifically classification or prediction techniques, for the analysis, obtaining and validation of the models. The data used correspond to admission exam results and sociodemographic data of 415 students of the computer science majors at the Autonomous University of Yucatán (Mexico), enrolled between 2016 and 2019. The results show that the best model corresponding to the algorithm of LMT classification, with a precision value of 75.42% and 0.805 for the area under the ROC curve. It was possible to identify the best predictive attributes, particularly the bachelor entrance exam tests were very significant. The development of computer tools for the early detection of academic risk and strategies for timely educational intervention is proposed.


Download data is not yet available.


Aldowah, H., Al-Samarraie, H., & Fauzy, W. M. (2019). Educational data mining and learning analytics for 21st century higher education: A review and synthesis. Telematics and Informatics, 37, 13–49.

Alyahyan, E., & Düştegör, D. (2020). Predicting academic success in higher education: literature review and best practices. International Journal of Educational Technology in Higher Education, 17(1).

Anoopkumar, M., & Rahman, A. M. J. (2016). A Review on Data Mining techniques and factors used in Educational Data Mining to predict student amelioration. Proceedings of 2016 International Conference on Data Mining and Advanced Computing, SAPIENCE 2016, 122–133.

Ayala, E., López, R. E., & Menéndez, V. H. (2020). Factores asociados al bajo rendimiento académico de estudiantes de primer semestre en carreras de computación. Congreso Internacional de Investigación Academia Journals Chetumal 2020, 12(2), 38–43. Recuperado de:

Aziz, A. A., Hafieza, N., & Ahmad, I. (2014). First Semester Computer Science Students’ Academic Performances Analysis by Using Data Mining Classification Algorithms. Proceeding of the International Conference on Artificial Intelligence and Computer Science(AICS 2014), (September), 100–109.

Baker, R. S., & Inventado, P. S. (2014). Chapter 4 Educational Data Mining and Learning Analytics. In J. A. Larusson & B. White (Eds.), Learning Analytics: From Research to Practice (pp. 61–75). NewYork: Springer.

Baker, R. S., Lindrum, D., Lindrum, M. J., & Perkowski, D. (2015). Analyzing Early At-Risk Factors in Higher Education e-Learning Courses. Proceedings of the 8th International Conference on Educational Data Mining (EDM 2015), 150–155. Recuperado de:

Baker, R. S., & Yacef, K. (2009). The State of Educational Data Mining in 2009: A Review and Future Visions. Journal of Educational Data Mining, 1(1), 3–16.

Bakhshinategh, B., Zaiane, O. R., ElAtia, S., & Ipperciel, D. (2018). Educational data mining applications and tasks: A survey of the last 10 years. Education and Information Technologies, 23(1), 537–553.

Ballester, L., Nadal, A., & Amer, J. (2017). Métodos y técnicas de investigación educativa (2 ed.). Palma: Ediciones UIB.

Berens, J., Schneider, K., Görtz, S., Oster, S., & Burghoff, J. (2019). Early Detection of Students at Risk-Predicting Student Dropouts Using Administrative Student Data from German Universities and Machine Learning Methods. Journal of Educational Data Mining, 11(3), 1–41.

Bouckaert, R. R., Frank, E., Hall, M., Kirkby, R., Reutemann, P., Seewald, A., & Scuse, D. (2018). WEKA Manual for Version 3-8-3. Hamilton, New Zealand: The University of Waikato. Recuperado de:

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

Buenaño-Fernández, D., Gil, D., & Luján-Mora, S. (2019). Application of Machine Learning in Predicting Performance for Computer Engineering Students: A Case Study. Sustainability, 11(10), 2833.

Costa, E. B., Fonseca, B., Santana, M. A., de Araújo, F. F., & Rego, J. (2017). Evaluating the effectiveness of educational data mining techniques for early prediction of students’ academic failure in introductory programming courses. Computers in Human Behavior, 73, 247–256.

Dorio, I. (2017). La transición a la Universidad. El grado de maestro de Educación Infantil (Tesis Doctoral). Universitat de Barcelona, España. Recuperado de:

Fawcett, T. (2003). ROC Graphs: Notes and Practical Considerations for Data Mining Researchers. HP Invent, 27.

García, D. (2015). Construcción de un Modelo para Determinar el Rendimiento Académico de los Estudiantes Basado en Learning Analytics (Análisis del Aprendizaje), mediante el Uso de Técnicas Multivariantes (Tesis Doctoral). Universidad de Sevilla, España. Recuperado de:

García Gutiérrez, J. A. (2016). Comenzando con Weka : Filtrado y selección de subconjuntos de atributos basada en su relevancia descriptiva para la clase. Madrid.

Gros, B. (2015). Retos y tendencias sobre el futuro de la investigación acerca del aprendizaje con tecnologías digitales. Revista de Educación a Distancia (RED), (32). Recuperado de:

Imran, M., Latif, S., Mehmood, D., & Shah, M. S. (2019). Student Academic Performance Prediction using Supervised Learning Techniques. International Journal of Emerging Technologies in Learning (IJET), 14(14), 92–104.

Kerlinger, F. N., & Lee, H. (2002). Investigación del comportamiento (4a ed.). México: McGraw-Hill.

Kumar, M., & Singh, A. J. (2017). Evaluation of Data Mining Techniques for Predicting Student’s Performance. International Journal of Modern Education and Computer Science, 8, 25–31. Recuperado de:

Kumar, M., Singh, A. J., & Handa, D. (2017). Literature Survey on Student’s Performance Prediction in Education using Data Mining Techniques. International Journal of Education and Management Engineering, 7(6), 40–49.

Lamas, H. (2015). Sobre el rendimiento escolar. Propósitos y Representaciones, 3(1), 351–386.

Landwehr, N., Hall, M., & Frank, E. (2006). Logistic model trees. Machine Learning, 2837, 241–252.

Le Cessie, S., & Van Houwelingen, J. C. (1992). Ridge Estimators in Logistic Regression. Applied Statistics, 41(1), 191–201.

López-Ramirez, V. M. (2015). Método sistémico para evaluar el rendimiento académico en instituciones de educación superior (Tesis Doctoral). Instituto Politécnico Nacional, México. Recuperado de:

López, C. E., Guzmán, E. L., & González, F. A. (2015). A Model to Predict Low Academic Performance at a Specific Enrollment Using Data Mining. Revista Iberoamericana de Tecnologías del Aprendizaje, 10(3), 119–125.

Márquez-Vera, C., Romero, C., & Ventura, S. (2012). Predicción del Fracaso Escolar Mediante Técnicas de Minería de Datos. IEEE-Rita, 7(3), 109–117. Recuperado de:

Martínez, D. L., Karanik, M., Giovannini, M., & Pinto, N. (2015). Perfiles de Rendimiento Académico: Un Modelo basado en Minería de datos. Campus Virtuales, 6(1), 12–30. Recuperado de:

Menacho, C. H. (2017). Predicción del rendimiento académico aplicando técnicas de minería de datos. Anales Científicos, 78(1), 26.

Merchan, S. M., & Duarte, J. A. (2016). Analysis of Data Mining Techniques for Constructing a Predictive Model for Academic Performance. IEEE Latin America Transactions, 14(6), 2783–2788.

Miguéis, V. L., Freitas, A., Garcia, P. J. V., & Silva, A. (2018). Early segmentation of students according to their academic performance: A predictive modelling approach. Decision Support Systems, 115, 36–51.

Minguillón, J., Casas, J., & Minguillón, J. (2017). Minería de datos: modelos y algoritmos. Recuperado de:

Mitra, S., & Pal, S. K. (1995). Fuzzy multi-layer perceptron, inferencing and rule generation. IEEE Transactions on Neural Networks, 6(1), 51–63.

Molina, M. (2015). Valoración de los criterios referentes al rendimiento académico y variables que lo puedan afectar. Revista Médica Electrónica, 37(6), 617–626. Recuperado de:

Montes, I. C., & Lerner, J. (2012). Rendimiento Académico de los estudiantes de pregrado de la Universidad EAFIT. Perspectiva Cuantitativa, 158. Recuperado de:

Muñoz, A. (2015). Modelos para la Mejora del Rendimiento Académico de Alumnos de la E.S.O. mediante Técnicas de Minería de Datos (Tesis Doctoral). Universidad de Murcia, España. Recuperado de:

Pacheco, V., Cruz, E., & Serrano, L. A. (2019). Rendimiento académico como factor de riesgo en estudiantes de licenciatura. Revista Electrónica de Psicología Iztacala, 22(2), 2318–2336. Recuperado de:

Padua, L. M. (2019). Factores individuales y familiares asociados al bajo rendimiento académico en estudiantes universitarios. Revista Mexicana de Investigación Educativa, 24(80), 173–195. Recuperado de:

Peña-Ayala, A. (2014). Educational Data Mining. In Studies in Computational Intelligence (Vol. 524).

Quinlan, R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann Publishers.

Rico, A., & Sánchez, D. (2018). Diseño de un modelo para automatizar la predicción del rendimiento académico en estudiantes del IPN / Design of a model to automate the prediction of academic performance in students of IPN. RIDE Revista Iberoamericana para la Investigación y el Desarrollo Educativo, 8(16), 246–266.

Río-Jenaro, C., Calle, R., Martín, E., & Robaina, N. (2018). Rendimiento académico en educación superior y su asociación con la participación activa en la plataforma Moodle. Estudios Sobre Educación, 34, 177–198.

Romero, C., & Ventura, S. (2010). Educational data mining: A review of the state of the art. IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, 40(6), 601–618.

Silva, M. (2011). El primer año universitario. Un tramo crítico para el éxito académico. Perfiles Educativos, 33(Extra 0), 102–114. Recuperado de:

Slater, S., Joksimović, S., Kovanovic, V., Baker, R. S., & Gasevic, D. (2017). Tools for Educational Data Mining: A Review. Journal of Educational and Behavioral Statistics, 42(1), 85–106.

UADY. (2012). Sistema de Atención integral al Estudiante. Universidad Autónoma de Yucatán. Recuperado de:

Valenzuela, J. R., & Flores, M. (2012). Fundamentos de investigación educativa (eBook, Vol. II). Monterrey, México: Editorial Digital del Tecnológico de Monterrey.

Villanueva, A., Moreno, L. G., & Salinas, M. J. (2018). Data mining techniques applied in educational environments: Literature review. Digital Education Review, (33), 235–266. Recuperado de:

Witten, I., Frank, E., & Hall, M. (2011). Data Mining: Practical Machine Learning Tools and Techniques (3rd ed.). Morgan Kaufmann.

How to Cite
Ayala Franco, E., López Martínez , R. E., & Menéndez Domínguez, V. H. (2021). Predictive models of academic risk in computing careers with educational data mining. Distance Education Journal, 21(66).
Learning Engineering and Instructional Engineering

Most read articles by the same author(s)