Using AI-powered multiple-choice question generation for self-regulated learning

Enrique Barra; Anabel Pilicita; Javier Conde; Sonsoles  López‑Pernas; Pedro Reviriego; Alejandro Pozo

Authors

Enrique Barra Universidad Politécnica de Madrid https://orcid.org/0000-0001-9532-8962
Anabel Pilicita Universidad Politécnica de Madrid https://orcid.org/0000-0002-0796-7797
Dr Universidad Politécnica de Madrid https://orcid.org/0000-0002-5304-0626
Dra University of Eastern Finland https://orcid.org/0000-0002-9621-1392
Dr Universidad Politécnica de Madrid https://orcid.org/0000-0003-2540-5234
Dr Universidad Politécnica de Madrid https://orcid.org/0000-0002-2160-1978

Keywords: AI-Generated Multiple-Choice Questions, Generative AI in Education, Adaptive Learning, AI in Higher Education, Large Language Models

Supporting Agencies

Agencia Estatal de Investigación (AEI) 10.13039/501100011033 a través del proyecto FuN-4Date
Grant PID2022-136684OB-C22, por la European Commission a través de Chips Act Joint Undertaking project SMARTY (Grant no. 101140087)
TUCAN6-CM (TEC-2024/COM-460), financiado por CM (ORDEN 5696/2024)

Abstract

This study examines the integration of generative AI in education, specifically evaluating AI-generated multiple-choice questions (MCQs) and their role in supporting self-regulated learning (SRL). Using AIQUIZ, an open-source AI-driven platform, 325 of the 593 enrolled students (54.8%) across four computing courses (Web Technologies and Databases) used the platform and generated 38,752 MCQs over two years. An explanatory sequential mixed-methods design analysed student performance, error reports, survey insights, and expert evaluations. Results showed a 70.79% overall student performance (79.45% in Databases, 66.84% in Web Technologies). Only 0.85% of questions were flagged by students as potentially incorrect, a figure that reflects user perception rather than a verified error rate. Surveys indicated strong student acceptance, engagement, and motivation, which are vital for the forethought phase of SRL. However, error analysis of flagged items revealed recurring issues like incorrectly marked answers and flawed distractors. These findings suggest that AI-generated MCQs may support the SRL cycle by facilitating forethought, performance control, and self-reflection. While Large Language Model (LLM) tools provide scalable opportunities for practice and self-assessment, our results confirm that human validation remains essential to ensure content quality and maximize learning benefits.

Downloads

Download data is not yet available.

Metrics

Views/Downloads

Abstract
0
PDF
0

References

Amo-Filvà, D., Guàrdia Ortiz, L., Donate-Beby, B., Bautista Pérez, G., & Fanni, L. (2026). Integración de la Inteligencia Artificial y la Alfabetización de Datos en la ESO: Análisis de percepciones y condiciones de adopción. Revista de Educación a Distancia (RED), 26(83), 1–01. https://doi.org/10.6018/RED.690641

Badali, S., Rawson, K. A., & Dunlosky, J. (2023). How do Students Regulate Their Use of Multiple Choice Practice Tests? Educational Psychology Review, 35(2), 1–26. https://doi.org/10.1007/S10648-023-09761-1/TABLES/4

Biancini, G., Ferrato, A., & Limongelli, C. (2024). Multiple-Choice Question Generation Using Large Language Models: Methodology and Educator Insights. UMAP 2024 - Adjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization, 584–590. https://doi.org/10.1145/3631700.3665233

Cabellos, B., Rey, U., Carlos, J., Alcorcón, E., De Aldama, C., & Pozo, J. I. (2026). Creencias del alumnado de Formación Profesional sobre el uso de la inteligencia artificial generativa en la enseñanza y el aprendizaje. Revista de Educación a Distancia (RED), 26(83), 7–8. https://doi.org/10.6018/RED.671331

Ch, D. R., & Saha, S. K. (2020). Automatic Multiple Choice Question Generation from Text: A Survey. IEEE Transactions on Learning Technologies, 13(1), 14–25. https://doi.org/10.1109/TLT.2018.2889100

Cheung, B. H. H., Lau, G. K. K., Wong, G. T. C., Lee, E. Y. P., Kulkarni, D., Seow, C. S., Wong, R., & Co, M. T. H. (2023). ChatGPT versus human in generating medical graduate exam multiple choice questions—A multinational prospective study (Hong Kong S.A.R., Singapore, Ireland, and the United Kingdom). PLOS ONE, 18(8), e0290691. https://doi.org/10.1371/JOURNAL.PONE.0290691

Doughty, J., Wan, Z., Bompelli, A., Qayum, J., Wang, T., Zhang, J., Zheng, Y., Doyle, A., Sridhar, P., Agarwal, A., Bogart, C., Keylor, E., Kultur, C., Savelka, J., & Sakr, M. (2023). A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming Education. IFAC Symposium on Advances in Control Education, 114–123. https://doi.org/10.1145/3636243.3636256

Gamage, S. H. P. W., Ayres, J. R., Behrend, M. B., & Smith, E. J. (2019). Optimising Moodle quizzes for online assessments. International Journal of STEM Education, 6(1), 1–14. https://doi.org/10.1186/S40594-019-0181-4/FIGURES/11

Graham, M. J., Milanowski, A. T., & Miller, J. B. (2012). Measuring and promoting inter-rater agreement of teacher and principal performance ratings.

Grévisse, C., Pavlou, M. A. S., & Schneider, J. G. (2024). Docimological Quality Analysis of LLM-Generated Multiple Choice Questions in Computer Science and Medicine. SN Computer Science, 5(5), 1–14. https://doi.org/ttps://doi.org/10.1007/s42979-024-02963-6

Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education, 15(3), 309–333. https://doi.org/10.1207/S15324818AME1503_5

Hang, C. N., Wei Tan, C., & Yu, P. D. (2024). MCQGen: A Large Language Model-Driven MCQ Generator for Personalized Learning. IEEE Access, 12, 102261–102273. https://doi.org/10.1109/ACCESS.2024.3420709

Kıyak, Y. S., Coşkun, Ö., Budakoğlu, I. İ., & Uluoğlu, C. (2024). ChatGPT for generating multiple-choice questions: Evidence on the use of artificial intelligence in automatic item generation for a rational pharmacotherapy exam. European Journal of Clinical Pharmacology, 80(5), 729–735. https://doi.org/10.1007/S00228-024-03649-X

Kumar, A. P., Nayak, A., K, M. S., Chaitanya, & Ghosh, K. (2024). A Novel Framework for the Generation of Multiple Choice Question Stems Using Semantic and Machine-Learning Techniques. International Journal of Artificial Intelligence in Education, 34(2), 332–375. https://doi.org/10.1007/s40593-023-00333-6

Kurdi, G., Leo, J., Parsia, B., Sattler, U., & Al-Emari, S. (2020). A Systematic Review of Automatic Question Generation for Educational Purposes. International Journal of Artificial Intelligence in Education, 30(1), 121–204. https://doi.org/10.1007/S40593-019-00186-Y/TABLES/17

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159. https://doi.org/10.2307/2529310

Lehtinen, T., Haaranen, L., & Leinonen, J. (2023). Automated Questionnaires About Students’ JavaScript Programs: Towards Gauging Novice Programming Processes. ACM International Conference Proceeding Series, 49–58. https://doi.org/https://doi.org/10.1145/3576123.3576129

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-T., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NIPS’20: Proceedings of the 34th International Conference on Neural Information Processing System, 9459–9474. https://doi.org/10.5555/3495724.3496517

Mistry, N. P., Saeed, H., Rafique, S., Le, T., Obaid, H., & Adams, S. J. (2024). Large Language Models as Tools to Generate Radiology Board-Style Multiple-Choice Questions. Academic Radiology, 31(9), 3872–3878. https://doi.org/10.1016/J.ACRA.2024.06.046

Mulla, N., & Gharpure, P. (2023). Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications. Progress in Artificial Intelligence, 12(1), 1–32. https://doi.org/10.1007/S13748-023-00295-9/TABLES/10

Purchase, H., Hamer, J., Denny, P., & Luxton-Reilly, A. (2010). The Quality of a PeerWise MCQ Repository. Proceedings of the Twelfth Australasian Conference on Computing Education. https://doi.org/10.5555/1862219.1862238

Qian, Y., & Lehman, J. (2017). Students’ misconceptions and other difficulties in introductory programming: A literature review. ACM Transactions on Computing Education, 18(1). https://doi.org/https://doi.org/10.1145/3077618

Shoaib, M., Husnain, G., Sayed, N., Yasin Ghadi, Y., Alajmi, M., & Qahmash, A. (2025). Automated Generation of Multiple-Choice Questions for Computer Science Education Using Conditional Generative Adversarial Networks. IEEE Access, 13, 16697–16715. https://doi.org/10.1109/ACCESS.2025.3530474

Song, T., Tian, Q., Xiao, Y., & Liu, S. (2024). Automatic Generation of Multiple-Choice Questions for CS0 and CS1 Curricula Using Large Language Models. Communications in Computer and Information Science, 2023 CCIS, 314–324. https://doi.org/10.1007/978-981-97-0730-0_28

Tek, F. B., Benli, K. S., & Deveci, E. (2018). Implicit Theories and Self-Efficacy in an Introductory Programming Course. IEEE Transactions on Education, 61(3), 218–225. https://doi.org/10.1109/TE.2017.2789183

Tran, A., Angelikas, K., Rama, E., Okechukwu, C., Smith, D. H., & MacNeil, S. (2023). Generating Multiple Choice Questions for Computing Courses Using Large Language Models. Proceedings - Frontiers in Education Conference, FIE. https://doi.org/10.1109/FIE58773.2023.10342898

Vu, S. T., Truong, H. T., Do, O. T., Le, T. A., & Mai, T. T. (2024). A ChatGPT-based approach for questions generation in higher education. AIQAM ’, 24, 13–18. https://doi.org/10.1145/3643479.3662056

Wang, J., Xiao, R., & Tseng, Y.-J. (2025). Generating AI Literacy MCQs: A Multi-Agent LLM Approach. Proceedings of the 56th ACM Technical Symposium on Computer Science Education V. 2, 1651–1652. https://doi.org/10.1145/3641555.3705189

Zimmerman, B. J. (2000). Attaining Self-Regulation: A Social Cognitive Perspective. Handbook of Self-Regulation, 13–39. https://doi.org/10.1016/B978-012109890-2/50031-7