×

Augmented beta rectangular regression models: a Bayesian perspective. (English) Zbl 1403.62215

Summary: Mixed effects Beta regression models based on Beta distributions have been widely used to analyze longitudinal percentage or proportional data ranging between zero and one. However, Beta distributions are not flexible to extreme outliers or excessive events around tail areas, and they do not account for the presence of the boundary values zeros and ones because these values are not in the support of the Beta distributions. To address these issues, we propose a mixed effects model using Beta rectangular distribution and augment it with the probabilities of zero and one. We conduct extensive simulation studies to assess the performance of mixed effects models based on both the Beta and Beta rectangular distributions under various scenarios. The simulation studies suggest that the regression models based on Beta rectangular distributions improve the accuracy of parameter estimates in the presence of outliers and heavy tails. The proposed models are applied to the motivating Neuroprotection Exploratory Trials in Parkinson’s Disease (PD) Long-term Study-1 (LS-1 study, \(n = 1741\)), developed by The National Institute of Neurological Disorders and Stroke Exploratory Trials in Parkinson’s Disease (NINDS NET-PD) network.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62F15 Bayesian inference
62J12 Generalized linear models (logistic models)

Software:

GAMLSS; Stan

References:

[1] Bayes, C. L., Bazán, J. L., García, C. (2012). A new robust regression model for proportions. Bayesian Analysis7, 841-866. · Zbl 1330.62272
[2] Cancho, V. G., Dey, D. K., Lachos, V. H. and Andrade, M. G. (2011). Bayesian nonlinear regression models with scale mixtures of skew‐normal distributions: estimation and case influence diagnostics. Computational Statistics and Data Analysis55, 588-602. · Zbl 1247.62083
[3] Carlin, B. P. and Louis, T. A. (2009). Bayesian Methods for Data Analysis. Chapman & Hall/CRC, Boca Raton, FL.
[4] Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association74, 829-836. · Zbl 0423.62029
[5] Dey, D. K., Chen, M. H. and Chang, H. (1997). Bayesian approach for nonlinear random effects models. Biometrics53, 1239-1252. · Zbl 0911.62024
[6] Diggle, P., Heagerty, P., Liang, K. Y. and Zeger, S. (2002). Analysis of Longitudinal Data. Oxford, UK: Oxford University Press.
[7] Duane, S., Kennedy, A. D., Pendleton, B. J. and Roweth, D. (1987). Hybrid Monte Carlo. Physics Letters B195, 216-222.
[8] Dunson, D. D. (2007). Bayesian methods for latent trait modelling of longitudinal data. Statistical Methods in Medical Research16, 399-415.
[9] Elm, J. J. (2012). Design innovations and baseline findings in a long‐term Parkinson’s trial: the national institute of neurological disorders and stroke exploratory trials in Parkinson’s Disease Long‐Term Study-1. Movement Disorders27, 1513-1521.
[10] Escobar, M. D. (1994). Estimating normal means with a Dirichlet process prior. Journal of the American Statistical Association89, 268-277. · Zbl 0791.62039
[11] Ferrari, S. and Cribari‐Neto, F. (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics31, 799-815. · Zbl 1121.62367
[12] Figueroa‐Zúniga, J. I., Arellano‐Valle, R. B. and Ferrari, S. L. P. (2013). Mixed beta regression: a Bayesian perspective. Computational Statistics and Data Analysis61, 137-147. · Zbl 1348.62194
[13] Galvis, D. M., Bandyopadhyay, D. and Lachos, V. H. (2014). Augmented mixed beta regression models for periodontal proportion data. Statistics in Medicine33, 3759-3771.
[14] García, C. B., Pérez, J. G. and vanDorp, J. R. (2011). Modeling heavy‐tailed, skewed and peaked uncertainty phenomena with bounded support. Statistical Methods and Applications20, 463-486. · Zbl 1238.62012
[15] Geisser, S. (1993). Predictive Inference. Vol. 55. CRC Press, Boca Raton, FL. · Zbl 0824.62001
[16] Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. and Rubin, D. B. (2013). Bayesian Data Analysis. London, UK: CRC press, Boca Raton, FL.
[17] Ghosh, P. and Hanson, T. (2010). A semiparametric Bayesian approach to multivariate longitudinal data. Australian and New Zealand Journal of Statistics52, 275-288. · Zbl 1373.62381
[18] Hahn, E. D. (2008). Mixture densities for project management activity times: a robust approach to PERT. European Journal of Operational Research188, 450-459. · Zbl 1149.90351
[19] Hatfield, L. A., Boye, M. E. and Carlin, B. P. (2011). Joint modeling of multiple longitudinal patient‐reported outcomes and survival. Journal of Biopharmaceutical Statistics21, 971-991.
[20] Hatfield, L. A., Boye, M. E., Hackshaw, M. D. and Carlin, B. P. (2012). Multilevel Bayesian models for survival times and longitudinal patient‐reported outcomes with many zeros. Journal of the American Statistical Association107, 875-885. · Zbl 1443.62382
[21] Hoffman, M. D. and Gelman, A. (2014). The no‐U‐turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research15, 1593-1623. · Zbl 1319.60150
[22] Jacqmin‐Gadda, Hélène, Sibillot, Solenne, Proust, Cécile, Molina, Jean‐Michel and Thiébaut, Rodolphe. (2007). Robustness of the linear mixed model to misspecified error distribution. Computational Statistics and Data Analysis51, 5142-5154. · Zbl 1162.62319
[23] Johnson, N. L., Kotz, S. and Balakrishnan, N. (1994). Continuous Univariate Distributions , Vol. 2. John Wiley & Sons, Hoboken, New Jersey. · Zbl 0811.62001
[24] Kieburtz, K., Tilley, B. C., Elm, J. J., et al. (2015). Effect of creatine monohydrate on clinical progression in patients with parkinson disease: a randomized clinical trial. JAMA313, 584-593.
[25] Kieschnick, R. and McCullough, B. D. (2003). Regression analysis of variates observed on (0, 1): percentages, proportions and fractions. Statistical Modelling3, 193-213. · Zbl 1070.62056
[26] Lee, Sik‐Yum and Song, Xin‐Yuan. (2004). Evaluation of the Bayesian and maximum likelihood approaches in analyzing structural equation models with small sample sizes. Multivariate Behavioral Research39, 653-686.
[27] Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis With Missing Data. John Wiley & Sons, Hoboken, New York. · Zbl 1011.62004
[28] McCulloch, C. E., Neuhaus, J. M., et al. (2011). Misspecifying the shape of a random effects distribution: why getting it wrong may not matter. Statistical Science26, 388-402. · Zbl 1246.62169
[29] Neal, R. M. (1994). An improved acceptance procedure for the hybrid Monte Carlo algorithm. Journal of Computational Physics111, 194-203. · Zbl 0797.65115
[30] Ospina, R. and Ferrari, S. L. P. (2010). Inflated Beta distributions. Statistical Papers51, 111-126. · Zbl 1247.62043
[31] Peng, F. and Dey, D. K. (1995). Bayesian analysis of outlier problems using divergence measures. Canadian Journal of Statistics23, 199-213. · Zbl 0833.62028
[32] Rigby, R. A. and Stasinopoulos, D. M. (2005). Generalized additive models for location, scale and shape. Journal of the Royal Statistical Society: Series C54, 507-554. · Zbl 1490.62201
[33] Rizopoulos, D., Verbeke, G. and Molenberghs, G. (2008). Shared parameter models under random effects misspecification. Biometrika95, 63-74. · Zbl 1437.62592
[34] Rogers, J. A., Polhamus, D., Gillespie, W. R., Ito, K., Romero, K., Qiu, R., Stephenson, D., Gastonguay, M. R. and Corrigan, B. (2012). Combining patient‐level and summary‐level data for Alzheimer’s disease modeling and simulation: a beta regression meta‐analysis. Journal of Pharmacokinetics and Pharmacodynamics39, 479-498.
[35] Ruppert, D. (2002). Selecting the number of knots for penalized splines. Journal of Computational and Graphical Statistics11, 735-757.
[36] Sinha, D. and Dey, D. K. (1997). Semiparametric Bayesian analysis of survival data. Journal of the American Statistical Association92, 1195-1212. · Zbl 1067.62520
[37] Smithson, M. and Verkuilen, J. (2006). A better lemon squeezer? Maximum‐likelihood regression with beta‐distributed dependent variables. Psychological Methods11, 54.
[38] Stan Development Team. (2014). Stan Modeling Language Users Guide and Reference Manual, Version 2.6.0.
[39] Stasinopoulos, D. M. and Rigby, R. A. (2007). Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software23, 1-46.
[40] Sun, Y. and Wu, H. (2005). Semiparametric time‐varying coefficients regression model for longitudinal data. Scandinavian Journal of Statistics32, 21-47. · Zbl 1091.62088
[41] Verkuilen, J. and Smithson, M. (2012). Mixed and mixture regression models for continuous bounded responses using the Beta distribution. Journal of Educational and Behavioral Statistics37, 82-113.
[42] Vieira, A. M. C., Hinde, J. P. and Demétrio, C. G. B. (2000). Zero‐inflated proportion data models applied to a biological control assay. Journal of Applied Statistics27, 373-389. · Zbl 0976.62108
[43] Zhao, L., Chen, Y. and Schaffner, D. W. (2001). Comparison of logistic regression and linear regression in modeling percentage data. Applied and Environmental Microbiology67, 2129-2135.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.