Validación de instrumentos psicométricos en ciencias sociales y de la salud: una guía práctica
Resumen
Recientemente se ha incrementado significativamente el número de estudios psicométricos junto a avances estadísticos cruciales para evaluar la fiabilidad y validez de los tests. Dada la importancia de proporcionar procedimientos más exactos tanto en la metodología como en la interpretación de las puntuaciones, los editores de la revista Anales de Psicología proponen esta guía para abordar los tópicos más relevantes en el campo de la psicometría aplicada. Con esta finalidad, el presente manuscrito analiza los tópicos principales de la Teoría Clásica de Tests (e.g., análisis factorial exploratorio/confirmatorio, fiabilidad, validez, sesgo, etc.) con vistas a sintetizar y clarificar las aplicaciones prácticas, y mejorar los estándares de publicación de estos trabajos.
Descargas
Citas
Abad, F. J., Olea, J., Ponsoda, V., & García, C. (2011). Medición en ciencias sociales y de la salud [Measurement in social and health sciences]. Síntesis.
Adams, R. J., Wu, M. L., Cloney, D., Berezner, A., & Wilson, M. (2020). ACER ConQuest: Generalised Item Response Modelling Software (Version 5.29) [Computer software]. Australian Council for Educational Research. https://www.acer.org/au/conquest
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-573. https://doi.org/10.1007/BF02293814
Andrich, D., & Luo, G. (1996). RUMMFOLDss: A Windows program for analyzing single stimulus responses of persons to items according to the hyperbolic cosine unfolding model. [Computer program]. Perth, Australia: Murdoch University.
American Educational Research Association. American Psychological Association. National Council on Measurement in Education (2014). Standards for educational and psychological testing. American Educational Research Association.
Asparouhov, T., & Muthén, B. (2009). Exploratory structural equation modeling. Structural Equation Modeling, 16, 397-438. https://doi.org/10.1080/10705510903008204
Beauducel, A., & Herzberg, P. Y. (2006). On the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Structural Equation Modeling, 13, 186-203. https://doi.org/10.1207/s15328007sem1302_2
Bentler, P. M., & Yuan, K. H. (1999). Structural equation modeling with small samples: Test statistics. Multivariate Behavioral Research, 34, 181-187. https://doi.org/10.1207/S15327906Mb340203
Bock, R. D., & Gibbons, R. (2010). Factor analysis of categorical item responses. In M. L. Nering and R. Ostini (Eds.). Handbook of polytomous item response theory models. Routledge.
Bond, T. G., & Fox, C. (2015). Applying the Rasch model; fundamental measurement in the Human Sciences. Routledge.
Brown, T. A. (2006). Confirmatory factor analysis for applied research. The Guilford Press.
Byrne, B. M., Shavelson, R. J. & Muthén, B. (1989). Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105(3), 456-466. https://doi.org/10.1037/0033-2909.105.3.456
Canivez, G. L. (2016). Bifactor modeling. In K. Schweizer & C. DiStefano (Eds), Principles and methods of test construction (pp. 247-271). Hogrefe.
Charter, R. A. (2000). Confidence interval formulas for split-half reliability coefficients. Psychological Reports, 86, 1168-1170. https://doi.org/10.1177/003329410008600317.2
Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 98-104. https://doi.org/10.1037/0021-9010.78.1.98
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Holt, Rinehart, and Winston.
de Ayala, R. J. (2009). The theory and practice of item response theory. The Guilford Press.
de Boeck, P., & Wilson, M. (Eds.) (2004). Explanatory item response models: A generalized linear and nonlinear approach. Springer-Verlag.
Enders, C. K. (2004). The impact of missing data on sample reliability estimates: Implications for reliability reporting practices. Educational and Psychological Measurement, 64, 419-436. https://doi.org/10.1177/0013164403261050
Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (pp. 105–146). Macmillan Publishing Co, Inc; American Council on Education.
Ferrando, P. J., & Lorezo-Seva, U. (2014). Exploratory item factor analysis: additional considerations. Annals of Psychology, 30(3), 1170-1175. https://doi.org/10.6018/analesps.30.3.199991
Finney, S. J. & DiStefano, C. (2006). Nonnormal and categorical data in structural equation models. In G. R. Hancock, & R. O. Mueller (Eds.), A second course in Structural equation modeling (pp. 269-314). Information Age.
Fisher, G. H., & Molenaar, I. W. (Eds.) (1995). Rasch models: Foundations, recent developments, and applications. Springer-Verlag.
Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychological Methods, 9, 466-491. https://doi.org/10.1037/1082-989X.9.4.466
Forero, C., Maydeu-Olivares, A., & Gallardo-Pujol, D. (2009). Factor analysis with ordinal indicators: A Monte Carlo study comparing DWLS and ULS estimation. Structural Equation Modeling, 16, 625-641. https://doi.org/10.1080/10705510903203573
Gilmer, J. S., & Feldt, L. S. (1983). Reliability estimation for a test with part of unknown lengths. Psychometrika, 48, 99-111. https://doi.org/10.1007/BF02314679
Goretzko, D., Pham, T. T. H., & Bühner, M. (2021). Exploratory factor analysis: Current use, methodological developments, and recommendations for good practice. Current Psychology, 40, 3510-3521. https://doi.org/10.1007/s12144-019-00300-2
Green, S. B., Lissitz, R. W., & Mulaik, S. A. (1977). Limitations of coefficient alpha as an index of test unidimensionality. Educational and Psychological Measurement, 37, 827-838. https://doi.org/10.1177/001316447703700403
Hambleton, R. K., Merenda, P. F., & Spielberger, C. D. (2005). Adapting educational and psychological tests for cross-cultural assessment. Lawrence Erlbaum Associates.
Henson, R. K. (2001). Understanding internal consistency reliability estimates: A conceptual primer on coefficient alpha. Measurement and Evaluation in Counseling and Development, 34, 177-189. https://doi.org/10.1080/07481756.2002.12069034
Jackson, D. L. (2001). Sample size and number of parameter estimates in maximum likelihood confirmatory factor analysis: A Monte Carlo investigation. Structural Equation Modeling, 8, 205-223. https://doi.org/10.1207/S15328007SEM0802_3
Lei, P. W. (2009). Evaluating estimation methods for ordinal data in structural equation modeling. Quality and Quantity, 43, 495-507. https://doi.org/10.1007/s11135-007-9133-z
Linacre, J.M. (2023). Winsteps® (Version 5.6.0) [Computer Software]. Portland, Oregon: Winsteps.com. Available from https://www.winsteps.com/
Lloret, S., Ferreres, A., Hernández, A., & Tomás, I. (2014). Exploratory item factor analysis: A practical guide revised and updated. Annals of Psychology, 30(3), 1151-1169. https://doi.org/10.6018/analesps.30.3.199361
Lloret, S., Ferreres, A., Hernández, A., & Tomás, I. (2017). The exploratory factor analysis of items: guided analysis based on empirical data and software. Annals of Psychology, 33(2), 417-432. https://doi.org/10.6018/analesps.33.2.270211
Lohr, K. N., Aaronson, N. K., Alonso, J., Burnam, M. A., Patrick, D. L., Perrin, E. B., & Roberts, J. S. (1996). Evaluating quality-of-life and health status instruments: development of scientific review criteria. Clinical Therapeutics, 18, 979-992. https://doi.org/10.1016/s0149-2918(96)80054-3
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Masters, G. (1982). A Rasch model for credit partial scoring. Psychometrika, 47, 149-174. https://doi.org/10.1007/BF02296272
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: LEA.
McHorney, C. A., & Tarlov, A. R. (1995). Individual-patient monitoring in clinical practice: Are available health status surveys adequate? Quality of Life Research, 4, 293-307. https://doi.org/10.1007/BF01593882
Mearns, J., Patchett, E., & Catanzaro, S. (2009). Multitrait-multimethod matrix validation of the Negative Mood Regulation Scale. Journal of Research in Personality, 43(5), 910-913. https://doi.org/10.1016/j.jrp.2009.05.003
Meyer, J. P. (2014). Applied measurement with jMetrik. Routdlege.
Michell, J. (1999). Measurement in Psychology: A critical history of a methodological concept. Cambridge University Press.
Millsap, R. E. (2011). Statistical approaches to measurement invariance. Routledge.
Millsap, R. E., & Yun-Tein, J. (2004). Assessing factorial invariance in ordered-categorical measures. Multivariate Behavioral Research, 39(3), 479-515. https://doi.org/10.1207/S15327906MBR3903_4
Muñiz, J., & Bartram, D. (2007). Improving international tests and testing. European Psychologist, 12, 206-219. https://doi.org/10.1027/1016-9040.12.3.206
Muraki, E. (1990). Fitting a polytomous item response model to Likert-type data. Applied Psychological Measurement, 14, 59-71. https://doi.org/10.1177/014662169001400106
Nering, M. L., & Ostini, R. (2010). Handbook of polytomous item response theory models. New York: Routledge.
O'Rourke, N. (2004). Reliability generalization of responses by care providers to the Center for Epidemiologic Studies-Depression Scale. Educational and Psychological Measurement, 64, 973-990. https://doi.org/10.1177/0013164404268668
Raykov, T. (2001). Estimation of congeneric scale reliability using covariance structure analysis with nonlinear restrictions. British Journal of Mathematical and Statistical Psychology, 54, 315-323. https://doi.org/10.1348/000711001159582
Raykov, T. (2002). Analytic estimation of standard error and confidence interval for scale reliability. Multivariate Behavioral Research, 37, 89-103. https://doi.org/10.1207/S15327906MBR3701_04
Raykov, T. (2004). Behavioral scale reliability and measurement invariance evaluation using latent variable modeling. Behavior Therapy, 35, 299-331. https://doi.org/10.1016/S0005-7894(04)80041-8
Reise, S. P., Morizot, J., & Hays, R. D. (2007). The role of the bifactor model in resolving dimensionality issues in health outcome measures. Quality of Life Research, 16, 19-31. https://doi.org/10.1007/s11136-007-9183-7
Robitzsch, A., & Lüdtke, O. (2023). Why full, partial, or approximate measurement invariance are not a prerequisite for meaningful and valid group comparisons. Structural Equation Modeling: A multidisciplinary Journal. https://doi.org/10.1080/10705511.2023.2191292
Rutkowski, L., & Svetina, D. (2014). Assessing the hypothesis of measurement invariance in the context of large-scale international surveys. Educational and Psychological Measurement, 74, 31-57. https://doi.org/10.1177/0013164413498257
Samejima, E. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, No. 17.
Sánchez-Meca, J., Marín-Martínez, F., López-López, J. A., Núñez-Núñez, R. M., Rubio-Aparicio, M., López-García, J. J., López-Pina, J. A., Blázquez-Rincón, D. M., López-Ibáñez, C., & López-Nicolás, R. (2021). Improving the reporting quality of reliability generalization meta-analyses: The REGEMA checklist. Research Synthesis Methods, 12(4), 516-536. https://doi.org/10.1002/jrsm.1487
Schmitt, N. (1996). Uses and abuses of coefficient alpha. Psychological Assessment, 8, 350-353. https://doi.org/10.1037/1040-3590.8.4.350
Schmitt, N., & Kuljanin, G. (2008). Mesurement invariance: Review of practice and implications. Human Resource Management Review, 18(4), 210-222. https://doi.org/10.1016/j.hrmr.2008.03.003
Shevlin, M., Miles, J. N. V., Davies, M. N. O., & Walker, S. (2000). Coefficient alpha: A useful indicator of reliability? Personality and Individual Differences, 28, 229-237. https://doi.org/10.1016/S0191-8869(99)00093-8
Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory. Sage.
Streiner, D. L. (2003). Starting at the beginning: An introduction to coefficient alpha and internal consistency. Journal of Personality Assessment, 80, 99-103. https://doi.org/10.1207/S15327752JPA8001_18
Streiner, D., Norman, G., & Cairney, J. (2015). Health measurement scales: A practical guide to their development and use. Oxford.
Svetina, D., Rutkowski, I., & Rutkowski, D. (2020). Multiple-group invariance with categorical outomes using updated guidelines: an illustration using Mplus and the lavaan/semtools packages. Structural Equation Modeling: A Multidisciplinary Journal, 27, 111-130. https://doi.org/10.1080/10705511.2019.1602776
The jamovi project (2023). jamovi (Version 2.3) [Computer Software]. Retrieved from https://www.jamovi.org
Thompson, B., & Vacha-Haase, T. (2000). Psychometrics is datametrics: The test is not reliable. Educational and Psychological Measurement, 60, 174-195. https://doi.org/10.1177/00131640021970448
Thompson, M. S. (2016). Assessing measurement invariance of scales using Multiple-Group Structural Equation Modeling. In K. Schewizer & C. DiStefano (Eds.), Principles and methods of test construction (pp. 218-244). Hogrefe.
Vacha-Haase, T. (1998). Reliability generalization: Exploring variance in measurement error affecting score reliability across studies. Educational and Psychological Measurement, 58, 6-20. https://doi.org/10.1177/0013164498058001002
van der Linden, W., & Hambleton, R. K. (Eds.) (1997). Handbook of modern item response theory. Springer.
Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3(1), 4-69. https://doi.org/10.1177/109442810031002
Viladrich, C., Angulo-Brunet, A., & Doval, E. (2017). A journey around alpha and omega to estimate internal consistency reliability. Annals of Psychology, 33(3), 755-782. http://dx.doi.org/10.6018/analesps.33.3.268401
Wright, B. D., & Stone, M. H. (1979). Best test design. Mesa Press.
Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Mesa Press.
Zinbarg, R. E., Revelle, W., Yovel, I., & Li, W. (2005). Cronbach’s α, Revelle's β, and McDonald's ωH: Their relations with each other and two alternative conceptualizations of reliability. Psychometrika, 70(1), 123-133. https://doi.org/10.1007/s11336-003-0974-7
Zinbarg, R. E., Yovel, I., Revelle, W., & McDonald, R. P. (2006). Estimating generalizability to a latent variable common to all of a scale’s indicators: A comparison of estimators for ωh. Applied Psychological Measurement, 30, 121-144. https://doi.org/10.1177/0146621605278814
Derechos de autor 2024 Servicio de Publicaciones, Universidad de Murcia (España)
Esta obra está bajo una licencia internacional Creative Commons Atribución-CompartirIgual 4.0.
Las obras que se publican en esta revista están sujetas a los siguientes términos:
1. El Servicio de Publicaciones de la Universidad de Murcia (la editorial) conserva los derechos patrimoniales (copyright) de las obras publicadas, y favorece y permite la reutilización de las mismas bajo la licencia de uso indicada en el punto 2.
© Servicio de Publicaciones, Universidad de Murcia, 2024
2. Las obras se publican en la edición electrónica de la revista bajo una licencia Creative Commons Reconocimiento-CompartirIgual 4.0 Internacional (texto legal). Se pueden copiar, usar, difundir, transmitir y exponer públicamente, siempre que: i) se cite la autoría y la fuente original de su publicación (revista, editorial y URL de la obra); ii) no se usen para fines comerciales; iii) se mencione la existencia y especificaciones de esta licencia de uso.
3. Condiciones de auto-archivo. Se permite y se anima a los autores a difundir electrónicamente las versiones pre-print (versión antes de ser evaluada y enviada a la revista) y/o post-print (versión evaluada y aceptada para su publicación) de sus obras antes de su publicación, ya que favorece su circulación y difusión más temprana y con ello un posible aumento en su citación y alcance entre la comunidad académica. Color RoMEO: verde.