Multiple Imputation of missing values in exploratory factor analysis of multidimensional scales: estimating latent trait scores

Urbano Lorenzo-Seva, Joost R. Van Ginkel


Researchers frequently have to analyze scales in which some participants have failed to respond to some items. In this paper we focus on the exploratory factor analysis of multidimensional scales (i.e., scales that consist of a number of subscales) where each subscale is made up of a number of Likert-type items, and the aim of the analysis is to estimate participants’ scores on the corresponding latent traits. Our approach uses the following steps: (1) multiple imputation creates several copies of the data, in which the missing values are imputed; (2) each copy of the data is subject to independent factor analysis, and the same number of factors is extracted from all copies; (3) all factor solutions are simultaneously orthogonally (or obliquely) rotated so that they are both (a) factorially simple, and (b) as similar to one another as possible; (4) latent trait scores are estimated for ordinal data in each copy; and (5) participants’ scores on the latent traits are estimated as the average of the estimates of the latent traits obtained in the copies. We applied the approach in a real dataset where missing responses were artificially introduced following a real pattern of non-responses and a simulation study based on artificial datasets. The results show that our approach was able to compute factor score estimates even for participants that have missing data.


Missing data; Hot-Deck imputation; Predictive mean matching imputation; Multiple imputation; Consensus Rotation; Factor scores; Exploratory factor analysis.

Full Text:



Aittokallio, T. (2010). Dealing with missing values in large-scale studies: microarray data imputation and beyond. Briefings in bioinformatics, 11, 253-264. doi:10.1093/bib/bbp059.

Andridge, R. R., & Little, R. J. (2010). A review of Hot Deck imputation for survey non-response. International Statistical Review, 78, 40-64. doi:10.1111/j.1751-5823.2010.00103.x.

Ayala, R. J., Plake, B. S., & Impara, J. C. (2001). The impact of omitted responses on the accuracy of ability estimation in item response theory. Journal of educational measurement, 38, 213-234. doi:10.1111/j.1745-3984.2001.tb01124.x.

Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: an application of the EM algorithm. Psychometrika, 46, 443-459. doi:10.1007/BF02293801.

Chen, J., & Choi, J. (2009). A comparison of maximum likelihood and expected a posteriori estimation for polychoric correlation using Monte Carlo simulation. Journal of Modern Applied Statistical Methods, 8(1), 32.

Cuesta, M., Fonseca, E., Vallejo, G., & Muñiz, J. (2013). Datos perdidos y propiedades psicométricas en los test de personalidad. Anales de Psicología, 29(1), 285-292. doi:10.6018/analesps.29.1.137901.

DeMars, C. (2003, April). Missing data and IRT item parameter estimation. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.

Ferrando, P.J. & Lorenzo-Seva, U. (2013). Unrestricted item factor analysis and some relations with item response theory. Technical Report. Department of Psychology, Universitat Rovira i Virgili, Tarragona. Retrieved from

Finch, H. (2008). Estimation of item response theory parameters in the presence of missing data. Journal of Educational Measurement, 45, 225-245. doi:10.1111/j.1745-3984.2008.00062.x.

Finch, H. (2011). The Use of Multiple Imputation for Missing Data in Uniform DIF Analysis: Power and Type I Error Rates. Applied Measurement in Education, 24, 281-301. doi:10.1080/08957347.2011.607054.

Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual review of psychology, 60, 549-576. doi: 10.1146/annurev.psych.58.110405.085530

Huisman, M., & Molenaar, I. W. (2001). Imputation of missing scale data with item response models. In Essays on item response theory (pp. 221-244). Springer New York. doi:10.1007/978-1-4613-0169-1_13.

Johnson, D. R., & Young, R. (2011). Toward best practices in analyzing datasets with missing data: Comparisons and recommendations. Journal of Marriage and Family, 73, 926-945. doi:10.1111/j.1741-3737.2011.00861.x.

Kaiser, H. F. (1974). An index of factorial simplicity. Psychometrika, 39, 31-36. doi:10.1007/BF02291575

Kiers, H. A. (1997). Techniques for rotating two or more loading matrices to optimal agreement and simple structure: A comparison and some technical details. Psychometrika, 62, 545-568. doi:10.1007/bf02294642.

Kleinke, K., Stemmler, M., Reinecke, J., & Lösel, F. (2011). Efficient ways to impute incomplete panel data. AStA Advances in Statistical Analysis, 95, 351-373. doi:10.1007/s10182-011-0179-9.

Lorenzo-Seva, U. (1999). Promin: a method for oblique factor rotation. Multivariate Behavioral Research, 34, 347-365. doi:10.1207/S15327906MBR3403_3.

Lorenzo-Seva, U., & Ferrando, P. J. (2013). FACTOR 9.2: A Comprehensive Program for Fitting Exploratory and Semiconfirmatory Factor Analysis and IRT Models. Applied Psychological Measurement, 37, 497-498. doi:10.1177/0146621613487794.

Lorenzo-Seva, U.; Kiers, H. A. L.; ten Berge, J. M. F. (2002). Techniques for oblique factor rotation of two or more loading matrices to a mixture of simple structure and optimal agreement. British Journal of Mathematical & Statistical Psychology, 55, 337-360. doi:10.1348/000711002760554624.

Mislevy, R. J. (1986). Recent developments in the factor analysis of categorical variables. Journal of educational statistics, 11, 3-31. doi:10.3102/10769986011001003.

Moustaki, I., Joreskog, K., & Mavridis, D. (2004). Factor models for ordinal variables with covariate effects on the manifest and latent variables: a comparison of LISREL and IRT approaches. Structural equation modelling, 11, 487-513. doi:10.1207/s15328007sem1104_1.

Muraki, E., & Engelhard, G. (1985). Full-information item factor analysis: Applications of EAP scores. Applied Psychological Measurement, 9, 417-430. doi:10.1177/014662168500900411.

Muthén, L. K., & Muthén, B. O. (1998-2011). Mplus User's Guide. (Sixth ed.). Los Angeles, CA: Muthén & Muthén.

Myers, T. A. (2011). Goodbye, listwise deletion: Presenting hot deck imputation as an easy and effective tool for handling missing data. Communication Methods and Measures, 5(4), 297-310. doi:10.1080/19312458.2011.624490.

Ono, M., & Miller, H. P. (1969). Income nonresponses in the current population survey. In Proceedings of the Social Statistics Section, American Statistical Association, 277-288.

Rässler, S., Rubin, D. B., & Zell, E. R. (2013). Imputation. Wiley Interdisciplinary Reviews: Computational Statistics, 5, 20-29. doi:10.1002/wics.1240.

Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581-592. doi:10.1093/biomet/63.3.581.

Rubin, D. B. (1978). Multiple imputations in sample surveys-a phenomenological Bayesian approach to nonresponse. In Proceedings of the Section on Survey Research Methods, American Statistical Association, 20-34.

Rubin, D. B. (1986). Statistical matching using file concatenation with adjusted weights and multiple imputations. Journal of Business & Economic Statistics, 4, 87-94. doi:10.1080/07350015.1986.10509497.

Schafer, J. L., & Graham, J. W. (2002). Missing data: our view of the state of the art. Psychological methods, 7, 147. doi: 10.1037/1082-989x.7.2.147.

Schlomer, G. L., Bauman, S., & Card, N. A. (2010). Best practices for missing data management in counseling psychology. Journal of Counseling Psychology, 57, 1-10. doi:10.1037/a0018082.

Siddique, J., & Belin, T. R. (2007). Multiple imputation using an iterative hot-deck with distance-based donor selection. Statistics in medicine, 27, 83-102. doi:10.1002/sim.3001.

Sijtsma, K., & Van der Ark, L. A. (2003). Investigation and treatment of missing item scores in test and questionnaire data. Multivariate Behavioral Research, 38, 505-528. doi:10.1207/s15327906mbr3804_4.

Ten Berge, J. M. (1977). Orthogonal Procrustes rotation for two or more matrices. Psychometrika, 42, 267-276. doi:10.1007/BF02294053.

Timmerman, M.E.; Lorenzo-Seva, U. (2011). Dimensionality assessment of ordered polytomous items with parallel analysis. Psychological Methods, 16, 209-220. doi:10.1037/a0023353.

Tucker, L. R., Koopman, R. F., & Linn, R. L. (1969). Evaluation of factor analytic research procedures by means of simulated correlation matrices. Psychometrika, 34, 421-459. doi:10.1007/BF02290601.

Vervloet, M., Kiers, H. A., Van den Noortgate, W., & Ceulemans, E. (2015). PCovR: An R Package for Principal Covariates Regression. Journal of Statistical Software, 65, 1-14. doi:10.18637/jss.v065.i08.

Vigil-Colet, A., Morales-Vives, F., Camps, E., Tous, J., & Lorenzo-Seva, U. (2013). Development and validation of the Overall Personality Assessment Scale (OPERAS). Psicothema, 25, 100-106. doi:10.7334/psicothema2011.411.

Wolkowitz, A. A., & Skorupski, W. P. (2013). A Method for Imputing Response Options for Missing Data on Multiple-Choice Assessments. Educational and Psychological Measurement, 73, 1036-1053. doi:10.1177/0013164413497016.

Yuan, K. H., & Lu, L. (2008). SEM with missing data and unknown population distributions using two-stage ML: Theory and its application. Multivariate Behavioral Research, 43, 621-652. doi:10.1080/00273170802490699.

Yuan, K. H., & Savalei, V. (2014). Consistency, bias and efficiency of the normal-distribution-based MLE: The role of auxiliary variables. Journal of Multivariate Analysis, 124, 353-370. doi:10.1016/j.jmva.2013.11.006.

Yuan, K. H., & Zhang, Z. (2012). Robust structural equation modeling with missing data and auxiliary variables. Psychometrika, 77, 803-826. doi:10.1007/s11336-012-9282-4.

Yuan, K. H., Marshall, L. L., & Bentler, P. M. (2002). A unified approach to exploratory factor analysis with missing data, nonnormal data, and in the presence of outliers. Psychometrika, 67, 95-121. doi:10.1007/BF02294711.



  • There are currently no refbacks.

Copyright (c) 2016 Servicio de Publicaciones de la Universidad de Murcia

Open AccessSello de Calidad FECyT 2013Thomson-Reuters-JCRScielo-Españadoajscimago