How to proceed when normality and sphericity are violated in the repeated measures ANOVA

Maria José Blanca; Rafael Alarcón; Jaume Arnau; Javier García-Castro; Roser Bono

doi:10.6018/analesps.594291

Authors

Maria José Blanca Department of Psychobiology and Behavioral Sciences Methodology, University of Malaga https://orcid.org/0000-0003-4046-9308
Rafael Alarcón Department of Psychobiology and Behavioral Sciences Methodology, University of Malaga https://orcid.org/0000-0003-2122-1374
Jaume Arnau Department of Social Psychology and Quantitative Psychology, University of Barcelona https://orcid.org/0000-0002-5325-5045
Javier García-Castro Department of Psychology, Universidad Loyola Andalucía https://orcid.org/0000-0003-1851-423X
Roser Bono Department of Social Psychology and Quantitative Psychology, University of Barcelona; Institute of Neurosciences, University of Barcelona https://orcid.org/0000-0001-7991-6668

DOI: https://doi.org/10.6018/analesps.594291

Keywords: Greenhouse-Geisser adjustment, Huynh-Feldt adjustment, Monte Carlo simulation, Robustness, Power

Supporting Agencies

This research was supported by grant PID2020-113191GB-I00, awarded through MCIN/AEI/10.13039/501100011033.

Abstract

Adjusted F-tests have typically been proposed as an alternative to the F-statistic in repeated measures ANOVA. Despite considerable research, it remains unclear how these statistics perform under simultaneous violation of normality and sphericity. Accordingly, our aim here was to conduct a detailed examination of Type I error and power of the F-statistic and the Greenhouse-Geisser (F-GG) and Huynh-Feldt (F-HF) adjustments, manipulating the number of repeated measures (3-6), sample size (10-300), sphericity (Greenhouse-Geisser epsilon estimator, from its lower to upper limit), and distribution shape (slight to extreme deviations from normality). The findings show that the behavior of F-GG and F-HF depends on the degree of violation of both normality, sphericity, and sample size. Overall, we suggest using F-GG under violation of sphericity and slight or moderate deviations from normality in all sample size; with severe deviations from both normality and sphericity F-GG may be used with a sample size larger than 10; and with extreme deviation from both normality and sphericity this statistic may be used with a sample size larger than 30. In the event of discrepant results between F-GG and F-HF, the choice depends on the epsilon value.

Downloads

Download data is not yet available.

Metrics

Views/Downloads

Abstract
651
pdf
576

References

Al-Subaihi, A. A. (2000). A Monte Carlo study of the Friedman and Conover tests in the single-factor repeated measures design. Journal of Statistical Computation and Simulation, 65(1-4), 203-223. https://doi.org/10.1080/00949650008811999

Armstrong, R. (2017). Recommendations for analysis of repeated-measures designs: Testing and correcting for sphericity and use of MANOVA and mixed model analysis. Ophthalmic & Physiological Optics, 37(5), 585–593. https://doi.org/1.1111/opo.12399.

Arnau, J., Bono, R., Blanca, M. J., & Bendayan, R. (2012). Using the linear mixed model to analyze non-normal data distributions in longitudinal designs. Behavior Research Methods, 44(4), 1224–1238. https://doi.org/10.3758/s13428-012-0196-y

Arnau, J., Bendayan, R., Blanca, M. J., & Bono, R. (2013). The effect of skewness and kurtosis on the robustness of linear mixed models. Behavior Research Methods, 45(3), 873–879. https://doi.org/10.3758/s13428-012-0306-x

Algina, J., & Keselman, H. (1997). Detecting repeated measures effects with univariate and multivariate statistics. Psychological Methods, 2(2), 208–218. https://doi.org/10.1037/1082-989X.2.2.208

Barcikowski, R. S., & Robey, R. R. (1984). Decisions in single group repeated measures analysis: Statistical tests and three computer packages. The American Statistician, 38(2), 148–150.

Berkovits, I., Hancock, G., & Nevitt, J. (2000). Bootstrap resampling approaches for repeated measure designs: Relative robustness to sphericity and normality violations. Educational and Psychological Measurement, 60(6), 877–892. https://doi.org/10.1177/00131640021970961

Blanca, M., Alarcón, R., & Bono, R. (2018). Current practices in data analysis procedures in psychology: What has changed? Frontiers in Psychology, 9, Article 2558. https://doi.org/10.3389/fpsyg.2018.02558

Blanca, M. J., Arnau, J., García-Castro, F. J., Alarcón, R., & Bono, R. (2023a). Non-normal data in repeated measures: Impact on Type I error and power. Psicothema, 35(1), 21–29. https://doi.org/10.7334/psicothema2022.292

Blanca, M. J., Arnau, J., García-Castro, F. J., Alarcón, R., & Bono, R. (2023b). Repeated measures ANOVA and adjusted F-tests when sphericity is violated: Which procedure is best? Frontiers in Psychology, 14, Article 1192453. https://doi.org/10.3389/fpsyg.2023.1192453

Blanca, M. J., Arnau, J., López-Montiel, D., Bono, R., & Bendayan, R. (2013). Skewness and kurtosis in real data samples. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 9(2), 78–84. https://doi.org/10.1027/1614-2241/a000057

Bono, R., Blanca, M. J., Arnau, J., & Gómez-Benito, J. (2017). Non-normal distributions commonly used in health, education, and social sciences: A systematic review. Frontiers in Psychology, 8, Article 1602. https://doi.org/10.3389/fpsyg.2017.01602

Box, G. E. P. (1954). Some theorems on quadratic forms applied in the study of analysis of variance problems II. Effect of inequality of variance and of correlation of error in the two-way classification. Annals of Mathematical Statistics, 25, 484–498. https://doi.org/10.1214/aoms/1177728717

Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31, 144–152. https://doi.org/10.1111/j.2044-8317.1978.tb00581.x

Collier, R. O., Baker, F. B., Mandeville, G. K., & Hayes, T. F. (1967). Estimates of test size for several test procedures based on conventional variance ratios in the repeated measures design. Psychometrika, 32(3), 339–353. https://doi.org/10.1007/BF02289596

Cooper, J. A., & Garson, G. D. (2016). Power analysis. Statistical Associates Blue Book Series.

Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–91. https://doi.org/10.3758/bf03193146

Fleishman, A. I. (1978). A method for simulating non-normal distributions. Psychometrika, 43(4), 521–532. https://1.1007/BF02293811

Geisser, S. W., & Greenhouse, S. (1958). An extension of Box's results on the use of the F distribution in multivariate analysis. The Annals of Mathematical Statistics, 29(3) 885–891. https://doi.org/10.1214/aoms/1177706545

Goedert, K., Boston, R., & Barrett, A. (2013). Advancing the science of spatial neglect rehabilitation: An improved statistical approach with mixed linear modeling. Frontiers in Human Neuroscience, 7, Article 211. https://doi.org/10.3389/fnhum.2013.00211

Greenhouse, S. W., & Geisser, S. (1959). On methods in the analysis of profile data. Psychometrika 24(2), 95–112. https://doi.org/10.1007/BF02289823

Harwell, M. R., & Serlin, R. C. (1994). A Monte Carlo study of the Friedman test and some competitors in the single factor, repeated measures design with unequal covariances. Computational Statistics & Data Analysis, 17(1), 35-49. https://doi.org/10.1016/0167-9473(92)00060-5

Haverkamp, N., & Beauducel, A. (2017). Violation of the sphericity assumption and its effect on Type-I error rates in repeated measures ANOVA and multi-level linear models (MLM). Frontiers in Psychology, 8, Article 1841. https://doi.org/10.3389/fpsyg.2017.01841

Haverkamp, N., & Beauducel, A. (2019). Differences of Type I error rates for ANOVA and multilevel-linear-models using SAS and SPSS for repeated measures designs. Meta-Psychology, 3, Article MP.2018.898. https://doi.org/10.15626/mp.2018.898

Hayoz, S. (2007). Behavior of nonparametric tests in longitudinal design. 15th European young statisticians meeting Available at: http://matematicas.unex.es/~idelpuerto/WEB_EYSM/Articles/ch_stefanie_hayoz_art.pdf

Huynh, H., & Feldt, L. S. (1976). Estimation of the Box correction for degrees of freedom from sample data in randomized block and split-plot designs. Journal of Educational Statistics, 1(1), 69–82. https://doi.org/10.2307/1164736

Keselman, J. C., Lix, L. M., & Keselman, H. J. (1996). The analysis of repeated measurements: A quantitative research synthesis. British Journal of Mathematical and Statistical Psychology, 49(2), 275–298. https://doi.org/10.1111/j.2044-8317.1996.tb01089.x

Kherad-Pajouh, S., & Renaud, O. (2015). A general permutation approach for analyzing repeated measures ANOVA and mixed-model designs. Statistical Papers, 56(4), 947–967. https://doi.org/1.1007/s00362-014-0617-3

Kirk, R. E. (2013). Experimental design. Procedures for the behavioral sciences (4th ed.). Sage Publications.

Livacic-Rojas, P., Vallejo, G., & Fernández, P. (2010). Analysis of Type I error rates of univariate and multivariate procedures in repeated measures designs. Communications in Statistics — Simulation and Computation, 39(3), 624–640. https://doi.org/10.1080/03610910903548952

Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analyzing data: A model comparison perspective (2nd ed.). Lawrence Erlbaum Associates.

Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105(1), 156–166. https://doi.org/10.1037/0033-2909.105.1.156

Muhammad, L. N. (2023). Guidelines for repeated measures statistical analysis approaches with basic science research considerations. The Journal of Clinical Investigation, 133(11), e171058. https://doi.org/10.1172/JCI171058

Muller, K. E., & Barton, C. N. (1989). Approximate power for repeated-measures ANOVA lacking sphericity. Journal of the American Statistical Association, 84(406), 549-555. https://doi.org/10.1080/01621459.1989.10478802

Muller, K., Edwards, L., Simpson, S., & Taylor, D. (2007). Statistical tests with accurate size and power for balanced linear mixed models. Statistics in Medicine, 26(19), 3639–3660. https://doi.org/10.1002/sim.2827

Oberfeld, D., & Franke, T. (2013). Evaluating the robustness of repeated measures analyses: The case of small sample sizes and nonnormal data. Behavior Research Methods, 45(3), 792–812. https://doi.org/10.3758/s13428-012-0281-2

Sheskin, D. J. (2003). Handbook of parametric and nonparametric statistical procedures. Chapman and Hall/CRC.

Voelkle, M. C., & McKnight, P. E. (2012). One size fits all? A Monte-Carlo simulation on the relationship between repeated measures (M)ANOVA and latent curve modeling. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 8, 23–38. https://doi.org/10.1027/1614-2241/a000044

Wilcox, R. R. (2022). Introduction to robust estimation and hypothesis testing (5th ed.). Academic Press.

How to proceed when normality and sphericity are violated in the repeated measures ANOVA

Authors

Supporting Agencies

Abstract

Downloads

References

Most read articles by the same author(s)

Similar Articles

doiissn

Language

Make a Submission

Information

logosfi

Keywords