Robustness of Generalized Linear Mixed Models for Split-Plot Designs with Binary Data
Abstract
This paper examined the robustness of the generalized linear mixed model (GLMM). The GLMM estimates fixed and random effects, and it is especially useful when the dependent variable is binary. It is also useful when the dependent variable involves repeated measures, since it can model correlation. The present study used Monte Carlo simulation to analyze the empirical Type I error rates of GLMMs in split-plot designs. The variables manipulated were sample size, group size, number of repeated measures, and correlation between repeated measures. Extreme conditions were also considered, including small samples, unbalanced groups, and different correlation in each group (pairing between group size and correlation between repeated measures). For balanced groups, the results showed that the group effect was robust under all conditions, while for unbalanced groups the effect tended to be conservative with positive pairing and liberal with negative pairing. Regarding time and interaction effects, the results showed, for both balanced and unbalanced groups, that: (a) The test was robust with low correlation (.2), but conservative for medium values of correlation (.4 and .6), and (b) the test tended to be conservative for positive and negative pairing, especially the latter.
Downloads
References
Aiken, L. S., Mistler, S. A., Coxe, S., & West, S. G. (2015). Analyzing count variables in individuals and groups: Single level and multilevel models. Group Process & Intergroup Relations, 18(3), 290–314. https://doi.org/10.1177/1368430214556702
Amatya, A., & Bhaumik, D. K. (2018). Sample size determination for multilevel hierarchical designs using generalized linear mixed models. Biometrics, 74(2), 673–684. https://doi.org/10.1111/biom.12764
Arnau, J., Bono, R., Blanca, M. J., & Bendayan, R. (2012). Using the linear mixed model to analyze non-normal data distributions in longitudinal designs. Behavior Research Methods, 44(4), 1224–1238. https://doi.org/10.3758/s13428-012-0196-y
Arnau, J., Bendayan, R., Blanca, M. J., & Bono, R. (2013). The effect of skewness and kurtosis on the robustness of linear mixed models. Behavior Research Methods, 45(3), 873–879. https://doi.org/10.3758/s13428-012-0306-x
Arnau, J., Bendayan, R., Blanca, M. J., & Bono, R. (2014a). The effect of skewness and kurtosis on the Kenward-Roger approximation when group distributions differ. Psicothema, 26(2), 279–285. https://doi.org/10.7334/psicothema2013.174
Arnau, J., Bendayan, R., Blanca, M. J., & Bono, R. (2014b). Should we rely on the Kenward–Roger approximation when using linear mixed models if the groups have different distributions? British Journal of Mathematical and Statistical Psychology, 67, 408–429. https://doi.org/10.1111/bmsp.12026
Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412. https://doi.org/10.1016/j.jml.2007.12.005
Baayen, H., Vasishth, S., Kliegl, R., & Bates, D. (2017). The cave of shadows: Addressing the human factor with generalized additive mixed models. Journal of Memory and Language, 94, 206–234. https://dx.doi.org/10.1016/j.jml.2016.11.006
Bakbergenuly, I., & Kulinskaya, E. (2018). Meta-analysis of binary outcomes via generalized linear mixed models: A simulation study. BMC Medical Research Methodology, 18(70), 1–18. https://doi.org/10.1186/s12874-018-0531-9
Bandera, E., & Pérez, L. (2018). Los modelos lineales generalizados mixtos. Su aplicación en el mejoramiento de plantas [Generalized linear mixed models: Their application in plant breeding]. Cultivos Tropicales, 39(1), 127–133.
Barker, D., D’Este, C., Campbell, M. J., & McElduff, P. (2017). Minimum number of clusters and comparison of analysis methods for cross sectional stepped wedge cluster randomised trials with binary outcomes: A simulation study. Trials, 18(119), 1–11. https://doi.org/10.1186/s13063-017-1862-2
Bauer, D. J., & Sterba, S. K. (2011). Fitting multilevel models with ordinal outcomes: Performance of alternative specifications and methods of estimation. Psychological Methods, 16(4), 373–390. https://doi.org/10.1037/a0025813
Bell, M. L., & Grunwald, G. K. (2011). Small sample estimation properties of longitudinal count models. Journal of Statistical Computation and Simulation, 81(9), 1067–1079. https://doi.org/10.1080/00949651003674144
Blanca, M. J., Alarcón, R., Arnau, J., Bono, R., & Bendayan, R. (2017). Non-normal data: Is ANOVA still a valid option? Psicothema, 29(4), 552–557. https://doi.org/10.7334/psicothema2016.383
Blanca, M. J., Alarcón, R., Arnau, J., Bono, R., & Bendayan, R. (2018). Effect of variance ratio on ANOVA robustness: Might 1.5 be the limit? Behavior Research Methods, 50, 937-962. https://doi.org/10.3758/s13428-017-0918-2
Blanca, M. J., Arnau, J., López-Montiel, D., Bono, R., & Bendayan, R. (2013). Skewness and kurtosis in real data samples. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 9(2), 78–84. https://doi.org/10.1027/1614-2241/a000057
Bolker, B. M., Brooks, M. E., Clark, C. J., Geange, S. W., Poulsen, J. R., Stevens, M. H. H., & White, J. S. (2009). Generalized linear mixed models: A practical guide for ecology and evolution. Trends in Ecology and Evolution, 24(2), 127–135. https://doi.org/10.1016/j.tree.2008.10.008
Bono, R., Alarcón, R., & Blanca, M. J. (2021). Report quality of generalized linear mixed models in psychology: A systematic review. Frontiers in Psychology, 12, Article 666182. https://doi.org/10.3389/fpsyg.2021.666182
Bono, R., Blanca, M. J., Arnau, J., & Gómez-Benito, J. (2017). Non-normal distributions commonly used in health, education, and social sciences: A systematic review. Frontiers in Psychology, 8, Article1602. https://doi.org/10.3389/fpsyg.2017.01602
Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31(2), 144–152. https://doi.org/10.1111/j.2044-8317.1978.tb00581.x
Breslow, N. E., & Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88(421), 9–25. https://doi.org/10.2307/2290687
Brown, H., & Prescott, R. (2006). Applied mixed models in medicine. (2nd ed.). John Wiley & Sons.
Casals, M., Girabent-Farrés, M., & Carrasco J. L. (2014). Methodological quality and reporting of generalized linear mixed models in clinical medicine (2000-2012): A systematic review. PLoS One, 9, Article e112653. https://doi.org/10.1371/journal.pone.0112653
Chen, T., Lu, N., Arora, J., Katz, I., Bossarte, R., He, H., Xia, Y., Zhang, H., & Tu, X.M. (2016). Power analysis for cluster randomized trials with binary outcomes modeled by generalized linear mixed-effects models. Journal of Applied Statistics, 43(6), 1104–1118. https://doi.org/10.1080/02664763.2015.1092109
Cho, S. J., Brown-Schmidt, S., & Lee, W. Y. (2018). Autoregressive generalized linear mixed effect models with crossed random effects: An application to intensive binary time series eye-tracking data. Psychometrika, 83(3), 751–771. https://doi.org/10.1007/s11336-018-9604-2
Cho, S., & Goodwin, A. P. (2017). Modeling learning in doubly multilevel binary longitudinal data using generalized linear mixed models: An application to measuring and explaining word learning. Psychometrika, 82(3), 846–870. https://doi.org/10.1007/s11336-016-9496-y
Cnnan, A., Laird, N. M., & Slasor, P. (1998). Tutorial in biostatistics: Using the general linear mixed model to analyse unbalanced repeated measures and longitudinal data. Statistics in Medicine, 16(20), 2349–2380. https://doi.org/10.1002/(sici)1097-0258(19971030)16:20<2349::aid-sim667>3.0.co;2-e
Coupé, C. (2018). Modeling linguistic variables with regression models: Addressing non-gaussian distributions, non-independent observations, and non-linear predictors with random effects and generalized additive models for location, scale, and shape. Frontiers in Psychology, 9, Article 513. http://doi.org/10.3389/fpsyg.2018.00513
Dang, Q., Mazumdar, S., & Houck, P. R. (2008). Sample size and power calculations based on generalized linear mixed models with correlated binary outcomes. Computer Methods and Programs in Biomedicine, 91(2), 122–127. https://doi.org/10.1016/j.cmpb.2008.03.001
Elosua, P., & De Boeck, P. (2020). Educational assessment issues in linguistically diverse contexts: A case study using a generalised linear mixed model. Language, Culture and Curriculum, 33(3), 305–318. https://doi.org/10.1080/07908318.2019.1662432
Emrich, L. J., & Piedmonte, M. R. (1991). A method for generating high-dimensional multivariate binary variables. American Statistician, 45(4), 302–304. https://doi.org/10.2307/2684460
Fang, L., & Louchin, T. M. (2013). Analyzing binomial data in split-plot design: classical approach or modern techniques? Communications in Statistics –Simulation and Computation, 42(4), 727–740. https://doi.org/10.1080/03610918.2011.650264
Fieberg, J., Matthiopoulos, J., Hebblewhite, M., Boyce, M. S., & Frair, J. L. (2010). Correlation and studies of habitat selection: Problem, red herring or opportunity? Philosophical Transactions of the Royal Society B, 365, 2233–2244. https://doi.org/10.1098/rstb.2010.0079
Gawarammana, M. B. M. B. K., & Sooriyarachchi, M. R. (2017). Comparison of methods for analyzing binary repeated measures data: A simulation-based study. Communications in Statistics – Simulation and Computation, 46(3), 2103–2120. https://doi.org/10.1080/03610918.2015.1035445
Hoque, E., & Torabi, M. (2018). Modeling the random effects covariance matrix for longitudinal data with covariates measurement error. Statistics in Medicine, 37(28), 4167–4184. https://doi.org/10.1002/sim.7908
Huang, L., Tang, L, Zhang, B., Zhang, Z., & Zhang, H. (2016). Comparison of different computational implementations on fitting generalized linear mixed-effects models for repeated count measures. Journal of Statistical Computation and Simulation, 86(12), 2392–2404. https://doi.org/10.1080/00949655.2015.1111376
Jacqmin-Gadda, H., Sibillot, S., Proust, C., Molina, J. M., & Thiébaut, R. (2007). Robustness of the linear mixed model to misspecified error distribution. Computational Statistics and Data Analysis, 51(10), 5142–5154. https://doi.org/10.1016/j.csda.2006.05.021
Jiang, D., & Oleson, J. J. (2011). Simulation study of power and sample size for repeated measures with multinomial outcomes: An application to sound direction identification experiments (SDIE). Statistics in Medicine, 30(19), 2451–2466. https://doi.org/10.1002/sim.4302
Johnson, P. C. D., Barry, S. J. E., Ferguson, H. M., & Müller, P. (2015). Power analysis for generalized linear mixed models in ecology and evolution. Methods in Ecology and Evolution 6, 133–42. https://doi.org/10.1111/2041-210X.12306
Kain, M. P., Bolker, B. M., & McCoy, M. W. (2015). A practical guide and power analysis for GLMMs: Detecting among treatment variation in random effects. PeerJ, 3, Article e1226. https://doi.org/10.7717/peerj.1226
Kenward, M. G., & Roger, J. H. (2009). An improved approximation to the precision of fixed effects from restricted maximum likelihood. Computational Statistics and Data Analysis, 53(7), 2583–2595. https://doi.org/10.1016/j.csda.2008.12.013
Koh, H., Li, Y., Zhan, X., Chen, J., & Zhao, N. (2019). A distance-based kernel association test based on the generalized linear mixed model for correlated microbiome studies. Frontiers in Genetics, 10, Article 458. https://doi.org/10.3389/fgene.2019.00458
Kowalchuk, R. K., Keselman, H. J., Algina, J., & Wolfinger, R. D. (2004). The analysis of repeated measurements with mixed-model adjusted F tests. Educational and Psychological Measurement, 64(2), 224–242. https://doi.org/10.1177/0013164403260196
Kruppa, J., & Hothorn, L. (2021). A comparison study on modeling of clustered and overdispersed count data for multiple comparisons. Journal of Applied Statistics, 48(16), 3220–3232. https://doi.org/10.1080/02664763.2020.1788518
Landerman, L. R., Mustillo, S. A., & Land, K. C. (2011). Modeling repeated measures of dichotomous data: Testing whether the within-person trajectory of change varies across levels of between-person factors. Social Science Research, 40(5), 1456–1464. https://doi.org/10.1016/j.ssresearch.2011.05.006
Lei, M., & Lomax, R. G. (2005). The effect of varying degrees on nonnormality in structural equation modeling. Structural Equation Modeling, 12(1), 1–27. https://doi.org/10.1207/s15328007sem1201_1
Li, P., & Redden, D. T. (2015). Comparing denominator degrees of freedom approximations for the generalized linear mixed model in analyzing binary outcome in small sample cluster-randomized trials. BMC Medical Research Methodology, 15(38), 1–12. https://doi.org/10.1186/s12874-015-006-x
Lin, K. C. (2010). Goodness-of-fit tests for modeling longitudinal ordinal data. Computational Statistics and Data Analysis, 54(7), 1872–1880. https://doi.org/10.1016/j.csda.2010.02.013
Lin, K. C., & Chen, Y. J. (2016). Goodness-of-fit- tests of generalized linear mixed models for repeated ordinal responses. Journal of Applied Statistics, 43(11), 2053–2064. https://doi.org/10.1080/02664763.2015.1126568
Litière, S., Alonso, A., & Molenberghs, G. (2007). Type I and Type II error under random-effects misspecification in generalized liner mixed models. Biometrics, 63(4), 1038–1044. https://doi.org/10.1111/j.1541-0420.2007.00782.x
Liu, S., Rovine, M. J., & Molenaar, P. C. (2012). Selecting a linear mixed model for longitudinal data: Repeated measures analysis of variance, covariance pattern model, and growth curve approaches. Psychological Methods, 17(1), 15–30. https://doi.org/10.1037/a0026971
Livacic-Rojas, P., Vallejo, G., & Fernández, P. (2010). Analysis of Type I error rates of univariate and multivariate procedures in repeated measures designs. Communications in Statistics — Simulation and Computation, 39(3), 624–640. https://doi.org/10.1080/03610910903548952
Lix, L. M., & Hinds, A. M. (2004). Multivariate contrasts for repeated measures designs under assumptions violations. Journal of Modern Applied Statistical Methods, 3(2), 333–344. https://doi.org/10.22237/jmasm/1099267620
Lo, S., & Andrews, S. (2015). To transform or not transform: Using generalized linear mixed models to analyses reaction time data. Frontiers in Psychology, 6, Article 1171. https://doi.org./10.3389/fpsyg.2015.01171
Malik, W. A., Marco-Llorca, C., Berendzen, K, & Piepho, H. P. (2020). Choice of link and variance function for generalized linear mixed models: A case study with binomial response in proteomics. Communications in Statistics – Theory and Methods, 49(17), 4313–4332. https://doi.org/10.1080/03610926.2019.1599021
McCulloch, C. E., & Neuhaus, J. M. (2011). Misspecifying the shape of a random effects distribution: Why getting it wrong may not matter. Statistical Science, 26(3), 388–402. https://doi.org/10.1214/11-STS361
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105(1), 156–166. https://doi.org/10.1037/0033-2909.105.1.156
Miller, M. L., Roe, D. J., Hu, C., & Bell, M. L. (2020). Power difference in a χ2 test vs generalized linear mixed model in the presence of missing data: A simulation study. BMC Medical Research Methodology, 20(50), 1–12. https://doi-org.sire.ub.edu/10.1186/s12874-020-00936-w
Moscatelli, A., & Lacquaniti, F. (2011). The weight of time: Gravitational force enhances discrimination of visual motion duration. Journal of Vision, 11(4), 1–17. https://doi.org/10.1167/11.4.5
Moscatelli, A., Mezzetti, M., & Lacquaniti, F. (2012). Modeling psychophysical data at the population-level: The generalized linear mixed model. Journal of Vision 12(26), 1–17. https://doi.org/10.1167/12.11.26
Moscatelli, A., Polito, L., & Lacquaniti, F. (2011). Time perception of action photographs is more precise than that of still photographs. Experimental Brain Research, 210(1), 25–32. https://doi.org./10.1007/s00221-011-2598-y
Mowen, T. J., & Culhane, S. E. (2017). Modeling recidivism within the study of offender reentry: Hierarchical generalized linear models and lagged dependent variable models. Criminal Justice and Behavior, 44(1), 85–102. https://doi.org/10.1177/0093854816678647
Noh, M., Wu, L., & Lee, Y. (2012). Hierarchical likelihood methods for nonlinear and generalized linear mixed models with missing data and measurement errors in covariates. Journal of Multivariate Analysis, 109, 42–51. http://doi.org/10.1016/j.jmva.2012.02.011
Platt, R. W., Leroux, B. G., & Breslow, N. (1999). Generalized linear mixed models for meta-analysis. Statistics in Medicine, 18(6), 643–654. https://doi.org/10.1002/(SICI)1097-0258(19990330)18:6<643::AID-SIM76>3.0.CO;2-M
Quené, H., & van den Bergh, H. (2008). Example of mixed-effects modeling with crossed random effects and with binomial data. Journal of Memory and Language, 59(4), 413–425. https://doi.org/10.1016/j.jml.2008.02.002
SAS Institute Inc. (2013). The GLIMMIX procedure. In SAS/STAT® 13.1 User’s Guide. SAS Institute Inc.
SAS Institute Inc. (2016). SAS/STAT® 14.2 User’s Guide. SAS Institute Inc.
Searle, M. P., Waters, D. J., Rex, D. C., & Wilson, R. N. (1992). Pressure, temperature and time constraints on Himalayan metamorphism from eastern Kashmir and western Zanskar. Journal of the Geological Society, 149(5), 753–773. https://doi.org./10.1144/gsjgs.149.5.0753
Skrondal, A., & Rabe-Hesketh, S. (2003). Some applications of generalized linear latent and mixed models in epidemiology: Repeated measures, measurement error and multilevel modeling. Norwegian Journal of Epidemiology, 13(2), 265–278.
Smith, L. M., Stroup, W. W., & Marx, D. B. (2020). Poisson cokriging as a generalized linear mixed model. Spatial Statistics, 35, Article 100399. https://doi.org/10.1016/j.spasta.2019.100399
Stroup, W. W. (2013). Generalized linear mixed models. Modern concepts, methods and applications. Taylor and Francis.
Stroup, W. W., Milliken, G. A., Claassen, E. A., & Wolfinger, R. D. (2018). SAS for mixed models: Introduction and basic applications. SAS Institute Inc.
Sun, S., Zhu, J., Mozaffari, S., Ober, C., Chen, M., & Zhou, X. (2019). Heritability estimation and differential analysis of count data with generalized linear mixed models in genomic sequencing studies. Bioinformatics, 35(3), 487-496. https://doi.org/10.1093/bioinformatics/bty644
Thiele, J., & Markusen, B. (2012). Potential of GLMM in modelling invasive spread. CAB Reviews, 7(16), 1–10. https://doi.org/10.1079/PAVSNNR20127016
Vallejo, G., Ato, M., Fernández, M. P., & Livacic-Rojas, P. E. (2019). Sample size estimation for heterogeneous growth curve models with attrition. Behavior Research Methods, 51(3), 1216–1243. https://doi.org/10.3758/s13428-018-1059-y
Vallejo, G., Ato, M., & Valdés, T. (2008). Consequences of misspecifying the error covariance structure in linear mixed models for longitudinal data. Methodology: European Journal of Research for the Behavioral and Social Sciences, 4(1), 10–21. https://doi.org/10.1027/1614-2241.4.1.10
Wicklin, R. (2013). Simulating data with SAS. SAS Institute Inc.
Witte, J. S., Greenland, S., Kim, L., & Arab, L. (2000). Multilevel modeling in epidemiology with GLIMMIX. Epidemiology, 11(6), 684–688. https://doi.org/10.1097/00001648-200011000-00012
Wolfinger, R., & O’Connell, M. (1993). Generalized linear models: A pseudo-likelihood approach. Journal of Statistical Computation and Simulation, 48(3-4), 233–243. https://doi.org/10.1080/00949659308811554
Yu, S., & Huang, X. (2019). Link misspecification in generalized linear mixed models with a random intercept for binary responses. Test, 28(3), 827–843. https://doi.org/10.1007/s11749-018-0602-6
Zhang, H., Lu, N., Feng, C., Thurston, S. W., Xia, Y., Zhu, L., & Tu, X. M. (2011). On fitting generalized linear mixed-effects models for binary responses using different statistical packages. Statistics in Medicine, 30(20), 2562–2572. https://doi.org/10.1002/sim.4265
Zhang, H., Yu, Q., Feng, C., Gunzler, D., Wu, P., & Tu, X. M. (2012). A new look at the difference between the GEE and the GLMM when modeling longitudinal count responses. Journal of Applied Statistics, 39(9), 2067–2079. https://doi.org/10.1080/02664763.2012.700452
Copyright (c) 2023 Servicio de Publicaciones, University of Murcia (Spain)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
The works published in this journal are subject to the following terms:
1. The Publications Service of the University of Murcia (the publisher) retains the property rights (copyright) of published works, and encourages and enables the reuse of the same under the license specified in paragraph 2.
© Servicio de Publicaciones, Universidad de Murcia, 2022
2. The works are published in the online edition of the journal under a Creative Commons Reconocimiento-CompartirIgual 4.0 (legal text). You can copy, use, distribute, transmit and publicly display, provided that: i) you cite the author and the original source of publication (journal, editorial and URL of the work), ii) are not used for commercial purposes, iii ) mentions the existence and specifications of this license.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
3. Conditions of self-archiving. Is allowed and encouraged the authors to disseminate electronically pre-print versions (version before being evaluated and sent to the journal) and / or post-print (version reviewed and accepted for publication) of their works before publication, as it encourages its earliest circulation and diffusion and thus a possible increase in its citation and scope between the academic community. RoMEO Color: Green.