×

Robust statistical modelling using the multivariate skew \(t\) distribution with complete and incomplete data. (English) Zbl 1218.62050

Summary: Missing data is inevitable in many situations that could hamper data analysis for scientific investigations. We establish flexible analytical tools for multivariate skew t models when fat-tailed, asymmetric and missing observations simultaneously occur in the input data. For the ease of computation and theoretical developments, two auxiliary indicator matrices are incorporated into the model for the determination of observed and missing components of each observation that can effectively reduce the computational complexity. Under the missing at random assumption, we present a Monte Carlo version of the expectation conditional maximization algorithm, which is performed to estimate the parameters and retrieve each missing observation with a single value. Additionally, a Metropolis-Hastings within Gibbs sampler with data augmentation is developed to account for the uncertainty of parameters as well as missing outcomes. The methodology is illustrated through two real data sets.

MSC:

62H10 Multivariate distribution of statistics
62F35 Robustness and adaptive procedures (parametric inference)
65C05 Monte Carlo methods
62H12 Estimation in multivariate analysis
62G15 Nonparametric tolerance and confidence regions

Software:

SAS/STAT
Full Text: DOI

References:

[1] Anderson TW, Journal of the American Statistical Association 52 pp 200– (1957) · doi:10.1080/01621459.1957.10501379
[2] Azzalini A, Journal of the Royal Statistical Society, Series B 65 pp 367– (2003) · Zbl 1065.62094 · doi:10.1111/1467-9868.00391
[3] Azzalini A, International Statistical Review 76 pp 106– (2008) · Zbl 1206.62102 · doi:10.1111/j.1751-5823.2007.00016.x
[4] Brooks SP, Journal of Computational and Graphical Statistics 7 pp 434– (1998)
[5] Browne WJ, Bayesian Analysis 1 pp 473– (2006) · Zbl 1331.62125 · doi:10.1214/06-BA117
[6] Chan KS, Journal of the American Statistical Association 90 pp 242– (1995) · doi:10.1080/01621459.1995.10476508
[7] Dempster AP, Journal of the Royal Statistical Society, B 39 pp 1– (1977)
[8] Edwards WH, Psychological Review 70 pp 193– (1963) · doi:10.1037/h0044139
[9] Efron B, Journal of the American Statistical Association 89 pp 463– (1994) · doi:10.1080/01621459.1994.10476768
[10] Gelfand AE, Journal of the Royal Statistical Society, Series B 56 pp 501– (1994)
[11] Gelfand AE, Journal of the American Statistical Association 85 pp 398– (1990) · doi:10.1080/01621459.1990.10476213
[12] Hastings WK, Biometrika 57 pp 97– (1970) · Zbl 0219.65008 · doi:10.1093/biomet/57.1.97
[13] Hocking RR, Journal of the American Statistical Association 63 pp 159– (1968)
[14] Jones MC, Journal of the Royal Statistical Society, Series B 65 pp 159– (2003) · Zbl 1063.62013 · doi:10.1111/1467-9868.00378
[15] Kotz S, Multivariate t distributions and their applications (2004) · Zbl 1100.62059 · doi:10.1017/CBO9780511550683
[16] Laird NM, Biometrics 38 pp 963– (1982) · Zbl 0512.62107 · doi:10.2307/2529876
[17] Lange KL, Journal of the American Statistical Association 84 pp 881– (1989)
[18] Lin TI, Journal of Multivariate Analysis 100 pp 257– (2009) · Zbl 1152.62034 · doi:10.1016/j.jmva.2008.04.010
[19] Lin TI, Statistics and Computing 20 pp 343– (2010) · doi:10.1007/s11222-009-9128-9
[20] Lin TI, Journal of Multivariate Analysis 100 pp 2337– (2009) · Zbl 1175.62054 · doi:10.1016/j.jmva.2009.07.005
[21] Lin TC, Computational Statistics 25 pp 183– (2010) · Zbl 1223.62088 · doi:10.1007/s00180-009-0169-5
[22] Liu CH, Journal of Multivariate Analysis 53 pp 139– (1995) · Zbl 0855.62051 · doi:10.1006/jmva.1995.1029
[23] Liu CH, Journal of Multivariate Analysis 69 pp 206– (1999) · Zbl 1057.62519 · doi:10.1006/jmva.1998.1793
[24] Liu CH, Statistica Sinica 5 pp 19– (1995)
[25] Meilijson I, Journal of the Royal Statistical Society, Series B 51 pp 127– (1989)
[26] Meng XL, Biometrika 80 pp 267– (1993) · Zbl 0778.62022 · doi:10.1093/biomet/80.2.267
[27] Pyne S, Proceedings of the National Academy of Sciences (PNAS) USA 106 pp 8519– (2009) · doi:10.1073/pnas.0903028106
[28] Richardson S, Journal of the Royal Statistical Society, Series B 59 pp 731– (1997) · doi:10.1111/1467-9868.00095
[29] Rubin DB, Biometrika 63 pp 581– (1976) · Zbl 0344.62034 · doi:10.1093/biomet/63.3.581
[30] Rubin DB, Multiple imputation for nonresponse in surveys (1987) · doi:10.1002/9780470316696
[31] Sahu SK, The Canadian Journal of Statistics 31 pp 129– (2003) · Zbl 1039.62047 · doi:10.2307/3316064
[32] SAS/Stat User’s Guide, Version 8.2 (2001)
[33] Schimert J, Analyzing data with missing values in S-Plus (2000)
[34] Self SG, Journal of the American Statistical Association 82 pp 605– (1987) · doi:10.1080/01621459.1987.10478472
[35] Shih WJ, Technometrics 28 pp 231– (1986) · doi:10.1080/00401706.1986.10488130
[36] Song PXK, Statistica Sinica 17 pp 929– (2007)
[37] Stram DO, Biometrics 50 pp 1171– (1994) · Zbl 0826.62054 · doi:10.2307/2533455
[38] Stram DO, Biometrics 51 pp 1196– (1995)
[39] Spiegelhalter DJ, Journal of the Royal Statistical Society, Series B 64 pp 583– (2002) · Zbl 1067.62010 · doi:10.1111/1467-9868.00353
[40] Tanner MA, Journal of the American Statistical Association 82 pp 528– (1987) · doi:10.1080/01621459.1987.10478458
[41] Wei GCG, Journal of the American Statistical Association 85 pp 699– (1990) · doi:10.1080/01621459.1990.10474930
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.