×

Choice of link and variance function for generalized linear mixed models: a case study with binomial response in proteomics. (English) Zbl 1511.62357

Summary: Non-normality is a common phenomenon in data from agricultural and biological research, especially in molecular data (for example; -omics, RNAseq, flow cytometric data, etc.). For over half a century, the leading paradigm called for using analysis of variance (ANOVA) after applying a data transformation. The introduction of generalized linear mixed models (GLMM) provides a new way of analyzing non-normal data. Selecting an apt link function in GLMM can be quite influential, however, and is as critical as selecting an appropriate transformation for ANOVA. In this paper, we assess the performance of different parametric link families available in literature. Then, we propose a new estimation method for selecting an appropriate link function with a suitable variance function in a quasi-likelihood framework. We apply these methods to a proteomics data set, showing that GLMMs provide a very flexible framework for analyzing these kinds of data.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62J10 Analysis of variance and covariance (ANOVA)
62J12 Generalized linear models (logistic models)
62F12 Asymptotic properties of parametric estimators

Software:

STUKEL
Full Text: DOI

References:

[1] Aranda-Ordaz, F. J., On two families of transformations to additivity for binary response data, Biometrika, 68, 2, 357-63 (1981) · Zbl 0466.62098 · doi:10.2307/2335580
[2] Atkinson, A. C., Plots, transformations and regression (1985), London: Clarendon, London · Zbl 0582.62065
[3] Baldi, I.; Maule, M.; Bigi, R.; Cortigiani, L.; Bo, S.; Gregori, D., Some notes on parametric link functions in clinical research, Statistical Methods in Medical Research, 18, 2, 131-44 (2009) · doi:10.1177/0962280208088624
[4] Bolker, B. M.; Brooks, M. E.; Clark, C. J.; Geange, S. W.; Poulsen, J. R.; Stevens, M. H. M.; White, J. S., Generalized linear mixed models: A practical guide for ecology and evolution, Trends in Ecology and Evolution, 24, 3, 127-35 (2009) · doi:10.1016/j.tree.2008.10.008
[5] Cox, D. R.; Reid, N., Parameter orthogonality and approximate conditional inference, Journal of the Royal Statistical Society, 49, 1-39 (1987) · Zbl 0616.62006
[6] Crowder, M. J., Beta-binomial ANOVA for proportions, Journal of Royal Statistical Society Series C, 27, 34-7 (1978) · doi:10.2307/2346223
[7] Czado, C., Link misspecification and data selected transformations in binary regression models. Ph.D. Thesis (1989), School of Operations Research and Industrial Engineering, Cornell University: School of Operations Research and Industrial Engineering, Cornell University, Ithaca, NY
[8] Czado, C.; Santner, T. J., Orthogonalizing parametric link transformation families in binary regression analysis, Canadian Journal of Statistics, 20, 1, 51-62 (1992) · Zbl 0761.62100 · doi:10.2307/3315574
[9] Czado, C., Parametric link modification of both tails in binary regression, Statistical Papers, 35, 1, 189-201 (1994) · Zbl 0807.62052 · doi:10.1007/BF02926413
[10] Czado, C., On selecting parametric link transformation families in generalized linear models, Journal of Statistical Planning and Inference, 61, 1, 125-39 (1997) · Zbl 0879.62060 · doi:10.1016/S0378-3758(96)00150-4
[11] Guerrero, V. M.; Johnson, R. A., Use of the Box Cox transformation with binary response models, Biometrika, 69, 2, 309-14 (1982) · doi:10.1093/biomet/69.2.309
[12] Holm, S., A simple sequentially rejective multiple test procedure, Scandinavian Journal of Statistics, 6, 65-70 (1979) · Zbl 0402.62058
[13] Jørgensen, B., Exponential dispersion models (with discussion), Journal of Royal Statistical Society Series B, 49, 127-62 (1987) · Zbl 0662.62078 · doi:10.1111/j.2517-6161.1987.tb01685.x
[14] Kerppola, T. K., Bimolecular fluorescence complementation (BiFC) analysis as a probe of protein interactions in living cells, Annual Review of Biophysics, 37, 465-87 (2008) · doi:10.1146/annurev.biophys.37.032807.125842
[15] Lange, K., Numerical analysis for statisticians (1999), New York: Springer-Verlag, New York · Zbl 0920.62001
[16] Lee, Y.; Nelder, J. A.; Pawitan, Y., Generalized linear models with random effects (2006), London: Chapman & Hall/CRC, London · Zbl 1110.62092
[17] Llorca, C. M.; Berendzen, K. W.; Malik, W. A.; Mahn, S.; Piepho, H.-P.; Zentgraf, U., The elucidation of the interactome of 16 Arabidopsis bZIP factors reveals three independent functional networks, PLoS ONE, 10, 10, e0139884 (2015) · doi:10.1371/journal.pone.0139884
[18] McCullagh, P.; Nelder, J. A., Generalized linear models (1989), London: Chapman & Hall, London · Zbl 0744.62098
[19] Nelder, J. A.; Wedderburn, R. W., Generalized linear models, Journal of Royal Statistical Society Series A, 135, 3, 370-84 (1972) · doi:10.2307/2344614
[20] Pearson, E. S., Bayes’ theorem, examined in the light of experimental sampling, Biometrika, 17, 3-4, 388-442 (1925) · JFM 51.0383.02 · doi:10.1093/biomet/17.3-4.388
[21] Piepho, H. P., The folded exponential transformation for proportions, Journal of the Royal Statistical Society: Series D (the Statistician), 52, 575-89 (2003) · doi:10.1046/j.0039-0526.2003.00509.x
[22] Piepho, H. P., Ridge regression and extensions for genome-wide selection in maize, Crop Science, 49, 4, 1165-76 (2009) · doi:10.2135/cropsci2008.10.0595
[23] Pinheiro, J. C.; Bates, D. M., Approximations to the loglikelihood function in the nonlinear mixed effects model, Journal of Computational and Graphical Statistics, 4, 12-35 (1995) · doi:10.2307/1390625
[24] Pregibon, D., Goodness of link tests for generalized models, Journal of Royal Statistical Society Series C, 29, 15-24 (1980) · Zbl 0434.62048 · doi:10.2307/2346405
[25] Prentice, R. L., Discrimination among some parametric models, Biometrika, 62, 607-14 (1975) · Zbl 0328.62013
[26] Prentice, R. L., A generalization of the Probit and Logit methods for dose response curves, Biometrics, 32, 761-68 (1976)
[27] Skellam, J. G., A probability distribution derived from the binomial distribution by regarding the probability of success as variable between the sets of trials, Journal of Royal Statistical Society Series B, 10, 257-61 (1948) · Zbl 0032.41903 · doi:10.1111/j.2517-6161.1948.tb00014.x
[28] Stukel, T., Generalized logistic models, Journal of American Statistical Society, 83, 402, 426-31 (1988) · doi:10.1080/01621459.1988.10478613
[29] Warton, D. I.; Hui, F., The arcsine is asinine: the analysis of proportions in ecology, Ecology, 92, 1, 3-10 (2011)
[30] Walter, M.; Chaban, C.; Schütze, K.; Batistic, O.; Weckermann, K.; Näke, C.; Blazevic, D.; Grefen, C.; Schumacher, K.; Oecking, C., Visualization of protein interactions in living plant cells using bimolecular fluorescence complementation, The Plant Journal: For Cell and Molecular Biology, 40, 3, 428-38 (2004) · doi:10.1111/j.1365-313X.2004.02219.x
[31] Wedderburn, R., Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method, Biometrika, 61, 439-47 (1974) · Zbl 0292.62050 · doi:10.1093/biomet/61.3.439
[32] Wolfinger, R.; O’Connell, M., Generalized linear mixed models: a pseudo-likelihood approach, Journal of Statistical Computation and Simulation, 48, 3-4, 233-43 (1993) · Zbl 0833.62067 · doi:10.1080/00949659308811554
[33] Young, L. J.; Campbell, N. L.; Capuano, G. A., Analysis of overdispersed count data from single-factor experiments: A comparative study, Journal of Agricultural, Biological and Environmental Statistics, 4, 3, 258-75 (1999) · doi:10.2307/1400385
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.