Validation of Psychometric Instrumentswith Classical Test Theory in Social and Health Sciences: A practical guide

Jose-Antonio López-Pina; Alejandro Veas

doi:10.6018/analesps.583991

Autores/as

Jose-Antonio López-Pina Department of Basic Psychology and Methodology, University of Murcia (Spain)
Alejandro Veas Department of Developmental and Educational Psychology, University of Murcia (Spain) https://orcid.org/0000-0002-5560-2215

DOI: https://doi.org/10.6018/analesps.583991

Palabras clave: Pyshometric studies, Reliability, Validity, Factor analysis

Resumen

Recientemente se ha incrementado significativamente el número de estudios psicométricos junto a avances estadísticos cruciales para evaluar la fiabilidad y validez de los tests. Dada la importancia de proporcionar procedimientos más exactos tanto en la metodología como en la interpretación de las puntuaciones, los editores de la revista Anales de Psicología proponen esta guía para abordar los tópicos más relevantes en el campo de la psicometría aplicada. Con esta finalidad, el presente manuscrito analiza los tópicos principales de la Teoría Clásica de Tests (e.g., análisis factorial exploratorio/confirmatorio, fiabilidad, validez, sesgo, etc.) con vistas a sintetizar y clarificar las aplicaciones prácticas, y mejorar los estándares de publicación de estos trabajos.

Descargas

Los datos de descargas todavía no están disponibles.

Citas

Abad, F. J., Olea, J., Ponsoda, V., & García, C. (2011). Medición en ciencias sociales y de la salud [Measurement in social and health sciences]. Síntesis.

Adams, R. J., Wu, M. L., Cloney, D., Berezner, A., & Wilson, M. (2020). ACER ConQuest: Generalised Item Response Modelling Software (Version 5.29) [Computer software]. Australian Council for Educational Research. https://www.acer.org/au/conquest

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-573. https://doi.org/10.1007/BF02293814

Andrich, D., & Luo, G. (1996). RUMMFOLDss: A Windows program for analyzing single stimulus responses of persons to items according to the hyperbolic cosine unfolding model. [Computer program]. Perth, Australia: Murdoch University.

American Educational Research Association. American Psychological Association. National Council on Measurement in Education (2014). Standards for educational and psychological testing. American Educational Research Association.

Asparouhov, T., & Muthén, B. (2009). Exploratory structural equation modeling. Structural Equation Modeling, 16, 397-438. https://doi.org/10.1080/10705510903008204

Beauducel, A., & Herzberg, P. Y. (2006). On the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Structural Equation Modeling, 13, 186-203. https://doi.org/10.1207/s15328007sem1302_2

Bentler, P. M., & Yuan, K. H. (1999). Structural equation modeling with small samples: Test statistics. Multivariate Behavioral Research, 34, 181-187. https://doi.org/10.1207/S15327906Mb340203

Bock, R. D., & Gibbons, R. (2010). Factor analysis of categorical item responses. In M. L. Nering and R. Ostini (Eds.). Handbook of polytomous item response theory models. Routledge.

Bond, T. G., & Fox, C. (2015). Applying the Rasch model; fundamental measurement in the Human Sciences. Routledge.

Brown, T. A. (2006). Confirmatory factor analysis for applied research. The Guilford Press.

Byrne, B. M., Shavelson, R. J. & Muthén, B. (1989). Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105(3), 456-466. https://doi.org/10.1037/0033-2909.105.3.456

Canivez, G. L. (2016). Bifactor modeling. In K. Schweizer & C. DiStefano (Eds), Principles and methods of test construction (pp. 247-271). Hogrefe.

Charter, R. A. (2000). Confidence interval formulas for split-half reliability coefficients. Psychological Reports, 86, 1168-1170. https://doi.org/10.1177/003329410008600317.2

Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 98-104. https://doi.org/10.1037/0021-9010.78.1.98

Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Holt, Rinehart, and Winston.

de Ayala, R. J. (2009). The theory and practice of item response theory. The Guilford Press.

de Boeck, P., & Wilson, M. (Eds.) (2004). Explanatory item response models: A generalized linear and nonlinear approach. Springer-Verlag.

Enders, C. K. (2004). The impact of missing data on sample reliability estimates: Implications for reliability reporting practices. Educational and Psychological Measurement, 64, 419-436. https://doi.org/10.1177/0013164403261050

Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (pp. 105–146). Macmillan Publishing Co, Inc; American Council on Education.

Ferrando, P. J., & Lorezo-Seva, U. (2014). Exploratory item factor analysis: additional considerations. Annals of Psychology, 30(3), 1170-1175. https://doi.org/10.6018/analesps.30.3.199991

Finney, S. J. & DiStefano, C. (2006). Nonnormal and categorical data in structural equation models. In G. R. Hancock, & R. O. Mueller (Eds.), A second course in Structural equation modeling (pp. 269-314). Information Age.

Fisher, G. H., & Molenaar, I. W. (Eds.) (1995). Rasch models: Foundations, recent developments, and applications. Springer-Verlag.

Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychological Methods, 9, 466-491. https://doi.org/10.1037/1082-989X.9.4.466

Forero, C., Maydeu-Olivares, A., & Gallardo-Pujol, D. (2009). Factor analysis with ordinal indicators: A Monte Carlo study comparing DWLS and ULS estimation. Structural Equation Modeling, 16, 625-641. https://doi.org/10.1080/10705510903203573

Gilmer, J. S., & Feldt, L. S. (1983). Reliability estimation for a test with part of unknown lengths. Psychometrika, 48, 99-111. https://doi.org/10.1007/BF02314679

Goretzko, D., Pham, T. T. H., & Bühner, M. (2021). Exploratory factor analysis: Current use, methodological developments, and recommendations for good practice. Current Psychology, 40, 3510-3521. https://doi.org/10.1007/s12144-019-00300-2

Green, S. B., Lissitz, R. W., & Mulaik, S. A. (1977). Limitations of coefficient alpha as an index of test unidimensionality. Educational and Psychological Measurement, 37, 827-838. https://doi.org/10.1177/001316447703700403

Hambleton, R. K., Merenda, P. F., & Spielberger, C. D. (2005). Adapting educational and psychological tests for cross-cultural assessment. Lawrence Erlbaum Associates.

Henson, R. K. (2001). Understanding internal consistency reliability estimates: A conceptual primer on coefficient alpha. Measurement and Evaluation in Counseling and Development, 34, 177-189. https://doi.org/10.1080/07481756.2002.12069034

Jackson, D. L. (2001). Sample size and number of parameter estimates in maximum likelihood confirmatory factor analysis: A Monte Carlo investigation. Structural Equation Modeling, 8, 205-223. https://doi.org/10.1207/S15328007SEM0802_3

Lei, P. W. (2009). Evaluating estimation methods for ordinal data in structural equation modeling. Quality and Quantity, 43, 495-507. https://doi.org/10.1007/s11135-007-9133-z

Linacre, J.M. (2023). Winsteps® (Version 5.6.0) [Computer Software]. Portland, Oregon: Winsteps.com. Available from https://www.winsteps.com/

Lloret, S., Ferreres, A., Hernández, A., & Tomás, I. (2014). Exploratory item factor analysis: A practical guide revised and updated. Annals of Psychology, 30(3), 1151-1169. https://doi.org/10.6018/analesps.30.3.199361

Lloret, S., Ferreres, A., Hernández, A., & Tomás, I. (2017). The exploratory factor analysis of items: guided analysis based on empirical data and software. Annals of Psychology, 33(2), 417-432. https://doi.org/10.6018/analesps.33.2.270211

Lohr, K. N., Aaronson, N. K., Alonso, J., Burnam, M. A., Patrick, D. L., Perrin, E. B., & Roberts, J. S. (1996). Evaluating quality-of-life and health status instruments: development of scientific review criteria. Clinical Therapeutics, 18, 979-992. https://doi.org/10.1016/s0149-2918(96)80054-3

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

Masters, G. (1982). A Rasch model for credit partial scoring. Psychometrika, 47, 149-174. https://doi.org/10.1007/BF02296272

McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: LEA.

McHorney, C. A., & Tarlov, A. R. (1995). Individual-patient monitoring in clinical practice: Are available health status surveys adequate? Quality of Life Research, 4, 293-307. https://doi.org/10.1007/BF01593882

Mearns, J., Patchett, E., & Catanzaro, S. (2009). Multitrait-multimethod matrix validation of the Negative Mood Regulation Scale. Journal of Research in Personality, 43(5), 910-913. https://doi.org/10.1016/j.jrp.2009.05.003

Meyer, J. P. (2014). Applied measurement with jMetrik. Routdlege.

Michell, J. (1999). Measurement in Psychology: A critical history of a methodological concept. Cambridge University Press.

Millsap, R. E. (2011). Statistical approaches to measurement invariance. Routledge.

Millsap, R. E., & Yun-Tein, J. (2004). Assessing factorial invariance in ordered-categorical measures. Multivariate Behavioral Research, 39(3), 479-515. https://doi.org/10.1207/S15327906MBR3903_4

Muñiz, J., & Bartram, D. (2007). Improving international tests and testing. European Psychologist, 12, 206-219. https://doi.org/10.1027/1016-9040.12.3.206

Muraki, E. (1990). Fitting a polytomous item response model to Likert-type data. Applied Psychological Measurement, 14, 59-71. https://doi.org/10.1177/014662169001400106

Nering, M. L., & Ostini, R. (2010). Handbook of polytomous item response theory models. New York: Routledge.

O'Rourke, N. (2004). Reliability generalization of responses by care providers to the Center for Epidemiologic Studies-Depression Scale. Educational and Psychological Measurement, 64, 973-990. https://doi.org/10.1177/0013164404268668

Raykov, T. (2001). Estimation of congeneric scale reliability using covariance structure analysis with nonlinear restrictions. British Journal of Mathematical and Statistical Psychology, 54, 315-323. https://doi.org/10.1348/000711001159582

Raykov, T. (2002). Analytic estimation of standard error and confidence interval for scale reliability. Multivariate Behavioral Research, 37, 89-103. https://doi.org/10.1207/S15327906MBR3701_04

Raykov, T. (2004). Behavioral scale reliability and measurement invariance evaluation using latent variable modeling. Behavior Therapy, 35, 299-331. https://doi.org/10.1016/S0005-7894(04)80041-8

Reise, S. P., Morizot, J., & Hays, R. D. (2007). The role of the bifactor model in resolving dimensionality issues in health outcome measures. Quality of Life Research, 16, 19-31. https://doi.org/10.1007/s11136-007-9183-7

Robitzsch, A., & Lüdtke, O. (2023). Why full, partial, or approximate measurement invariance are not a prerequisite for meaningful and valid group comparisons. Structural Equation Modeling: A multidisciplinary Journal. https://doi.org/10.1080/10705511.2023.2191292

Rutkowski, L., & Svetina, D. (2014). Assessing the hypothesis of measurement invariance in the context of large-scale international surveys. Educational and Psychological Measurement, 74, 31-57. https://doi.org/10.1177/0013164413498257

Samejima, E. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, No. 17.

Sánchez-Meca, J., Marín-Martínez, F., López-López, J. A., Núñez-Núñez, R. M., Rubio-Aparicio, M., López-García, J. J., López-Pina, J. A., Blázquez-Rincón, D. M., López-Ibáñez, C., & López-Nicolás, R. (2021). Improving the reporting quality of reliability generalization meta-analyses: The REGEMA checklist. Research Synthesis Methods, 12(4), 516-536. https://doi.org/10.1002/jrsm.1487

Schmitt, N. (1996). Uses and abuses of coefficient alpha. Psychological Assessment, 8, 350-353. https://doi.org/10.1037/1040-3590.8.4.350

Schmitt, N., & Kuljanin, G. (2008). Mesurement invariance: Review of practice and implications. Human Resource Management Review, 18(4), 210-222. https://doi.org/10.1016/j.hrmr.2008.03.003

Shevlin, M., Miles, J. N. V., Davies, M. N. O., & Walker, S. (2000). Coefficient alpha: A useful indicator of reliability? Personality and Individual Differences, 28, 229-237. https://doi.org/10.1016/S0191-8869(99)00093-8

Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory. Sage.

Streiner, D. L. (2003). Starting at the beginning: An introduction to coefficient alpha and internal consistency. Journal of Personality Assessment, 80, 99-103. https://doi.org/10.1207/S15327752JPA8001_18

Streiner, D., Norman, G., & Cairney, J. (2015). Health measurement scales: A practical guide to their development and use. Oxford.

Svetina, D., Rutkowski, I., & Rutkowski, D. (2020). Multiple-group invariance with categorical outomes using updated guidelines: an illustration using Mplus and the lavaan/semtools packages. Structural Equation Modeling: A Multidisciplinary Journal, 27, 111-130. https://doi.org/10.1080/10705511.2019.1602776

The jamovi project (2023). jamovi (Version 2.3) [Computer Software]. Retrieved from https://www.jamovi.org

Thompson, B., & Vacha-Haase, T. (2000). Psychometrics is datametrics: The test is not reliable. Educational and Psychological Measurement, 60, 174-195. https://doi.org/10.1177/00131640021970448

Thompson, M. S. (2016). Assessing measurement invariance of scales using Multiple-Group Structural Equation Modeling. In K. Schewizer & C. DiStefano (Eds.), Principles and methods of test construction (pp. 218-244). Hogrefe.

Vacha-Haase, T. (1998). Reliability generalization: Exploring variance in measurement error affecting score reliability across studies. Educational and Psychological Measurement, 58, 6-20. https://doi.org/10.1177/0013164498058001002

van der Linden, W., & Hambleton, R. K. (Eds.) (1997). Handbook of modern item response theory. Springer.

Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3(1), 4-69. https://doi.org/10.1177/109442810031002

Viladrich, C., Angulo-Brunet, A., & Doval, E. (2017). A journey around alpha and omega to estimate internal consistency reliability. Annals of Psychology, 33(3), 755-782. http://dx.doi.org/10.6018/analesps.33.3.268401

Wright, B. D., & Stone, M. H. (1979). Best test design. Mesa Press.

Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Mesa Press.

Zinbarg, R. E., Revelle, W., Yovel, I., & Li, W. (2005). Cronbach’s α, Revelle's β, and McDonald's ωH: Their relations with each other and two alternative conceptualizations of reliability. Psychometrika, 70(1), 123-133. https://doi.org/10.1007/s11336-003-0974-7

Zinbarg, R. E., Yovel, I., Revelle, W., & McDonald, R. P. (2006). Estimating generalizability to a latent variable common to all of a scale’s indicators: A comparison of estimators for ωh. Applied Psychological Measurement, 30, 121-144. https://doi.org/10.1177/0146621605278814

Validación de instrumentos psicométricos en ciencias sociales y de la salud: una guía práctica

Autores/as

Resumen

Descargas

Citas

Artículos más leídos del mismo autor/a

Publication Facts

Author statements

Indexado: {$indexList}

doiissn

Idioma

Enviar un artículo

Información

logosfi

Palabras clave