Research article Special Issues

Unbiased random effects estimation in generalized linear mixed models via likelihood-based boosting

  • Published: 19 January 2026
  • MSC : 62J07

  • Boosting techniques present a popular alternative to conventional methods for estimating covariate effects in generalized linear mixed models. Additionally to functionality in high-dimensional data setups, they also offer variable selection. The established framework for boosting in GLMMs tends to exhibit problematic behavior in the selection process when cluster-constant covariates are present, which leads to incorrect estimates. We propose an improved algorithm that rectifies this issue by reworking the updating process of the random effects which already proved successful for linear mixed models, i.e. normally distributed outcomes, and provide additional insights regarding the computation of model complexity. We show the improvements in the quality of estimates via various simulations and data examples.

    Citation: Johanna Gerstmeyer, Elisabeth Bergherr, Colin Griesbach. Unbiased random effects estimation in generalized linear mixed models via likelihood-based boosting[J]. AIMS Mathematics, 2026, 11(1): 1675-1700. doi: 10.3934/math.2026070

    Related Papers:

  • Boosting techniques present a popular alternative to conventional methods for estimating covariate effects in generalized linear mixed models. Additionally to functionality in high-dimensional data setups, they also offer variable selection. The established framework for boosting in GLMMs tends to exhibit problematic behavior in the selection process when cluster-constant covariates are present, which leads to incorrect estimates. We propose an improved algorithm that rectifies this issue by reworking the updating process of the random effects which already proved successful for linear mixed models, i.e. normally distributed outcomes, and provide additional insights regarding the computation of model complexity. We show the improvements in the quality of estimates via various simulations and data examples.



    加载中


    [1] D. I. Abrams, A. I. Goldman, C. Launer, J. A. Korvick, J. D. Neaton, L. R. Crane, et al., A comparative trial of didanosine or zalcitabine after treatment with zidovudine in patients with human immunodeficiency virus infection, New England J. Medic., 330 (1994), 657–662. http://doi.org/10.1056/NEJM199403103301001 doi: 10.1056/NEJM199403103301001
    [2] H. Akaike, Information theory and the extension of the maximum likelihood principle, In: Second International Symposium on Information Theory, 1973,267–281.
    [3] N. E. Breslow, D. G. Clayton, Approximate inference in generalized linear mixed models, J. Amer. Stat. Assoc., 88 (1993), 9–52. https://doi.org/10.1080/01621459.1993.10594284 doi: 10.1080/01621459.1993.10594284
    [4] P. Bühlmann, B. Yu, Boosting with the L2 loss: Regression and classification, J. Amer. Stat. Assoc., 98 (2003), 324–339. https://doi.org/10.1198/016214503000125 doi: 10.1198/016214503000125
    [5] L. Fahrmeir, T. Kneib, S. Lang, B. Marx, Regression, Berlin: Springer-Verlag, 2013.
    [6] L. Fahrmeir, G. Tutz, Multivariate Statistical Modelling Based on Generalized Linear Models, New York: Springer-Verlag, 2001.
    [7] Y. Fang, Asymptotic equivalence between cross-validations and akaike information criteria in mixed-effects models, J. Data Sci., 9 (2011), 15–21.
    [8] Y. Freund, R. E. Schapire, Experiments with a new boosting algorithm, In: Proceedings of the Thirteenth International Conference on Machine Learning Theory, 1996,148–156.
    [9] Y. Freund, R. E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci., 55 (1997), 119–139.
    [10] J. H. Friedman, Greedy function approximation: A gradient boosting machine, Ann. Stat., 29 (2001), 1189–1232.
    [11] C. Griesbach, A. Groll, E. Bergherr, Addressing cluster-constant covariates in mixed effects models via likelihood-based boosting techniques, PLos One, 16 (2021a), 0254178. https://doi.org/10.1371/journal.pone.0254178 doi: 10.1371/journal.pone.0254178
    [12] C. Griesbach, B. Säfken, E. Waldmann, Gradient boosting for linear mixed models, Int. J. Biostat., 17 (2021b), 317–329. https://doi.org/10.1515/ijb-2020-0136 doi: 10.1515/ijb-2020-0136
    [13] A. Groll, Variable selection by regularization methods for generalized mixed models, PhD thesis, Ludwig-Maximilians-Universität München, Munich, Germany, 2011.
    [14] A. Groll, GMMBoost: Likelihood-based boosting for generalized mixed models, R package version 1.1.3., 2013.
    [15] T. Hepp, J. Thomas, A. Mayr, B. Bischl, Probing for sparse and fast variable selection with model-based boosting, Comput. Math. Meth. Medic., 2017 (2017), 1421409. https://doi.org/10.1155/2017/1421409 doi: 10.1155/2017/1421409
    [16] B. Hofner, L. Boccuto, M. Göker, Controlling false discoveries in high-dimensional situations: Boosting with stability selection, BMC Bioinform., 16 (2015), 144. https://doi.org/10.1186/s12859-015-0575-3 doi: 10.1186/s12859-015-0575-3
    [17] T. Hothorn, P. Buehlmann, T. Kneib, M. Schmid, B. Hofner, Mboost: Model-based boosting, R package version 2.9-1, 2018.
    [18] T. Kneib, T. Hothorn, G. Tutz, Variable selection and model choice in geoadditive regression models, Biometrics, 65 (2009), 626–634. https://doi.org/10.1111/j.1541-0420.2008.01112.x doi: 10.1111/j.1541-0420.2008.01112.x
    [19] E. Koehler, E. Brown, S. Haneuse, On the assessment of monte carlo error in simulation-based statistical analyses, Amer. Stat., 63 (2009), 155–162. https://doi.org/10.1198/tast.2009.0030 doi: 10.1198/tast.2009.0030
    [20] N. M. Laird, J. H. Ware, Random-effects models for longitudinal data, Biometrics, 38 (1982), 963–974.
    [21] I. E. Leppik, F. E. Dreifuss, T. Bowman, N. Santilli, M. P. Jacobs, C. Crosby, et al., A double-blind crossover evaluation of progabide in partial seizures: 3: 15 PM8, Neurology, 35 (1985), 385.
    [22] X. Lin, N. E. Breslow, Bias correction in generalized linear mixed models with multiple components of dispersion, J. Amer. Stat. Assoc., 91 (1996), 1007–1016.
    [23] N. T. Longford, A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects, Biometrika, 74 (1987), 817–827. https://doi.org/10.1093/biomet/74.4.817 doi: 10.1093/biomet/74.4.817
    [24] A. Mayr, H. Binder, O. Gefeller, M. Schmid, The evolution of boosting algorithms - from machine learning to statistical modelling, Meth. Inform. Medic., 53 (2014), 419–427.
    [25] A. Mayr, B. Hofner, M. Schmid, The importance of knowing when to stop. a sequential stopping rule for component-wise gradient boosting, Meth. Inform. Medic., 51 (2012), 178–186. https://doi.org/10.3414/ME11-02-0030 doi: 10.3414/ME11-02-0030
    [26] P. McCullagh, J. Nelder, Generalized Linear Models, London: Chapman and Hall, 1989.
    [27] J. Pinheiro, D. Bates, S. DebRoy, D. Sarkar, R Core Team, nlme: Linear and Nonlinear Mixed Effects Models, R package version, 3.1-148, 2020.
    [28] S. Potts, E. Bergherr, C. Reinke, C. Griesbach, Prediction-based variable selection for component-wise gradient boosting, Int. J. Biostat., 20 (2024), 293–314. https://doi.org/10.1515/ijb-2023-0052 doi: 10.1515/ijb-2023-0052
    [29] R Core Team, R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, 2025.
    [30] G. Schwarz, Estimating the dimension of a model, Ann. Stat., 6 (1978), 461–464.
    [31] D. J. Spiegelhalter, N. G. Best, B. P. Carlin, A. Van Der Linde, Bayesian measures of model complexity and fit, J. Royal Stat. Soc. Ser. B, 64 (2002), 583–639. https://doi.org/10.1111/1467-9868.00353 doi: 10.1111/1467-9868.00353
    [32] C. Staerk, A. Mayr, Randomized boosting with multivariable base-learners for high-dimensional variable selection and prediction, BMC Bioinform., 22 (2021), 44. https://doi.org/10.1186/s12859-021-04340-z doi: 10.1186/s12859-021-04340-z
    [33] A. Strömer, C. Staerk, N. Klein, L. Weinhold, S. Titze, A. Mayr, Deselection of base-learners for statistical boosting–with an application to distributional regression, Stat. Meth. Med. Res., 31 (2022), 207–224. https://doi.org/10.1177/09622802211051088 doi: 10.1177/09622802211051088
    [34] G. Tutz, H. Binder, Generalized additive models with implicit variable selection by likelihood-based boosting, Biometrics, 62 (2006), 961–971. https://doi.org/10.1111/j.1541-0420.2006.00578.x doi: 10.1111/j.1541-0420.2006.00578.x
    [35] G. Tutz, A. Groll, Generalized linear mixed models based on boosting, In: Statistical Modelling and Regression Structures, 2010,197–216. https://doi.org/10.1007/978-3-7908-2413-1_11
    [36] G. Tutz, F. Reithinger, A boosting approach to flexible semiparametric mixed models, Stat. Medic., 26 (2007), 2872–2900. https://doi.org/10.1002/sim.2738 doi: 10.1002/sim.2738
    [37] G. Wahba, A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem, Ann. Stat., 13 (1985), 1378–1402.
  • Reader Comments
  • © 2026 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(60) PDF downloads(9) Cited by(0)

Article outline

Figures and Tables

Figures(9)  /  Tables(7)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog