×

Asymptotic normality for inference on multisample, high-dimensional mean vectors under mild conditions. (English) Zbl 1317.62047

Summary: In this paper, we consider the asymptotic normality for various inference problems on multisample and high-dimensional mean vectors. We verify that the asymptotic normality of concerned statistics is proved under mild conditions for high-dimensional data. We show that the asymptotic normality can be justified theoretically and numerically even for non-Gaussian data. We introduce the extended cross-data-matrix (ECDM) methodology to construct an unbiased estimator at a reasonable computational cost. With the help of the asymptotic normality, we show that the concerned statistics given by ECDM can ensure consistency properties for inference on multisample and high-dimensional mean vectors. We give several applications such as confidence regions for high-dimensional mean vectors, confidence intervals for the squared norm and the test of multisample mean vectors. We also provide sample size determination so as to satisfy prespecified accuracy on inference. Finally, we give several examples by using a microarray data set.

MSC:

62H10 Multivariate distribution of statistics
62L10 Sequential statistical analysis
60F05 Central limit and other weak theorems

References:

[1] Aoshima M, Yata K (2011a) Two-stage procedures for high-dimensional data. Seq Anal 30:356-399 (Editor’s special invited paper) · Zbl 1228.62096 · doi:10.1080/07474946.2011.619088
[2] Aoshima M, Yata K (2011b) Authors’ response. Seq Anal 30:432-440 · Zbl 1284.62499 · doi:10.1080/07474946.2011.619102
[3] Aoshima M, Yata K (2011c) Effective methodologies for statistical inference on microarray studies. In: Spiess PE (ed) Prostate cancer - from bench to bedside. InTech, pp 13-32
[4] Bai Z, Sarandasa H (1996) Effect of high dimension: by an example of a two sample problem. Stat Sin 6:311-329 · Zbl 0848.62030
[5] Chen SX, Qin YL (2010) A two-sample test for high-dimensional data with applications to gene-set testing. Ann Stat 38:808-835 · Zbl 1183.62095 · doi:10.1214/09-AOS716
[6] Chiaretti S, Li X, Gentleman R, Vitale A, Vignetti M, Mandelli F, Ritz J, Foa R (2004) Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood 103:2771-2778 · doi:10.1182/blood-2003-09-3243
[7] Ghosh M, Mukhopadhyay N, Sen PK (1997) Sequential estimation. Wiley, New York · Zbl 0953.62079 · doi:10.1002/9781118165928
[8] McLeish DL (1974) Dependent central limit theorems and invariance principles. Ann Probab 2:620-628 · Zbl 0287.60025 · doi:10.1214/aop/1176996608
[9] Pollard, KS; Dudoit, S.; Laan, MJ; Gentleman, R. (ed.); Carey, V. (ed.); Huber, W. (ed.); Irizarry, R. (ed.); Dudoit, S. (ed.), Multiple testing procedures: R multitest package and applications to genomics, 249-271 (2005), New York · doi:10.1007/0-387-29362-0_15
[10] Srivastava MS (2005) Some tests concerning the covariance matrix in high dimensional data. J Jpn Stat Soc 35:251-272 · doi:10.14490/jjss.35.251
[11] Yata K, Aoshima M (2010) Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix. J Multivar Anal 101:2060-2077 · Zbl 1203.62112 · doi:10.1016/j.jmva.2010.04.006
[12] Yata K, Aoshima M (2012) Inference on high-dimensional mean vectors with fewer observations than the dimension. Methodol Comput Appl Probab 14:459-476 · Zbl 06124697 · doi:10.1007/s11009-011-9233-z
[13] Yata K, Aoshima M (2013) Correlation tests for high-dimensional data using extended cross-data-matrix methodology. J Multivar Anal 117:313-331 · Zbl 1277.62150 · doi:10.1016/j.jmva.2013.03.007
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.