×

Controlling the false discovery rate for latent factors via unit-rank deflation. (English) Zbl 1473.62203

Summary: While sparse factor regression is often encountered under high dimensions, it still is unclear how to control the false discovery rate (FDR) of latent factors. In this paper, we propose a variable selection procedure to address the issue and prove that the FDR can be asymptotically controlled at a target level. Moreover, our approach is scalable and memory-efficient in practice owing to the divide-and-conquer strategy.

MSC:

62H25 Factor analysis and principal components; correspondence analysis
62J07 Ridge regression; shrinkage estimators (Lasso)

Software:

SOFAR
Full Text: DOI

References:

[1] Barber, R. F.; Candès, E. J., Controlling the false discovery rate via knockoffs, Ann. Statist., 43, 5, 2055-2085 (2015) · Zbl 1327.62082
[2] Bühlmann, P.; van de Geer, S., Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer Series in Statistics (2011), Springer: Springer Berlin, Heidelberg, https://link.springer.com/book/10.1007/978-3-642-20192-9 · Zbl 1273.62015
[3] Bunea, F.; She, Y.; Wegkamp, M. H., Optimal selection of reduced rank estimators of high-dimensional matrices, Ann. Statist., 39, 2, 1282-1309 (2011) · Zbl 1216.62086
[4] Candès, E.; Fan, Y.; Janson, L.; Lv, J., Panning for gold: ‘model-x’ knockoffs for high dimensional controlled variable selection, J. R. Stat. Soc. Ser. B. Stat. Methodol., 80, 3, 551-577 (2018), https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12265 · Zbl 1398.62335
[5] Fan, Y.; Demirkaya, E.; Li, G.; Lv, J., RANK: Large-scale inference with graphical nonlinear knockoffs, J. Amer. Statist. Assoc., 115, 529, 362-379 (2020) · Zbl 1437.62699
[6] Fan, Y.; Demirkaya, E.; Lv, J., Nonuniformity of P-values can occur early in diverging dimensions, J. Mach. Learn. Res., 20, 77, 1-33 (2019), http://jmlr.org/papers/v20/18-314.html · Zbl 1489.62225
[7] Gustin, M. C.; Albertyn, J.; Alexander, M.; Davenport, K., MAP Kinase pathways in the yeastsaccharomyces cerevisiae, Microbiol. Mol. Biol. Rev., 62, 4, 1264-1300 (1998)
[8] Liu, H.; Zhang, J., Estimation consistency of the group lasso and its applications, (Artificial Intelligence and Statistics, 5 (2009)), 376-383
[9] Mishra, A.; Dey, D. K.; Chen, K., Sequential co-sparse factor regression, J. Comput. Graph. Statist., 26, 4, 814-825 (2017)
[10] Uematsu, Y.; Fan, Y.; Chen, K.; Lv, J.; Lin, W., SOFAR: Large-scale association network learning, IEEE Trans. Inform. Theory, 65, 8, 4924-4939 (2019), https://ieeexplore.ieee.org/document/8685192 · Zbl 1432.68402
[11] Zheng, Z.; Bahadori, M. T.; Liu, Y.; Lv, J., Scalable interpretable multi-response regression via SEED, J. Mach. Learn. Res., 20, 107, 1-34 (2019), http://jmlr.org/papers/v20/18-200.html · Zbl 1441.62214
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.