×

Penalized partial least square applied to structured data. (English) Zbl 1441.62140

Summary: Nowadays, data analysis applied to high dimension has arisen. The edification of high-dimensional data can be achieved by the gathering of different independent data. However, each independent set can introduce its own bias. We can cope with this bias introducing the observation set structure into our model. The goal of this article is to build theoretical background for the dimension reduction method sparse Partial Least Square (sPLS) in the context of data presenting such an observation set structure. The innovation consists in building different sPLS models and linking them through a common-Lasso penalization. This theory could be applied to any field, where observation present this kind of structure and, therefore, improve the sPLS in domains, where it is competitive. Furthermore, it can be extended to the particular case, where variables can be gathered in given a priori groups, where sPLS is defined as a sparse group Partial Least Square.

MSC:

62H12 Estimation in multivariate analysis
62J07 Ridge regression; shrinkage estimators (Lasso)
62R07 Statistical aspects of big data and data science

Software:

mixOmics; Eigenstrat

References:

[1] Boulesteix, A-L; Strimmer, K., Partial least squares: a versatile tool for the analysis of high-dimensional genomic data, Brief. Bioinform., 8, 1, 32-44 (2006)
[2] Chun, H.; Keleş, S., Sparse partial least squares regression for simultaneous dimension reduction and variable selection, J. R. Stat. Soc. Ser. B (Stat. Methodol.), 72, 1, 3-25 (2010) · Zbl 1411.62184
[3] De Bie, T.; Cristianini, N.; Rosipal, R.: Eigenproblems in pattern recognition. In: Handbook of Geometric Computing, pp. 129-167. Springer (2005)
[4] Eslami, A.; Qannari, EM; Kohler, A.; Bougeard, S., Algorithms for multi-group pls, J. Chemom., 28, 3, 192-201 (2014)
[5] Gagnon-Bartsch, JA; Speed, TP, Using control genes to correct for unwanted variation in microarray data, Biostatistics, 13, 3, 539-552 (2012)
[6] Geladi, P.; Kowalski, BR, Partial least-squares regression: a tutorial, Anal. Chim. Acta, 185, 1-17 (1986)
[7] Herman, W.: Path models with latent variables: the nipals approach. In: Quantitative Sociology, pp. 307-357. Elsevier (1975)
[8] Lê Cao, K-A; Rossouw, D.; Robert-Granié, C.; Besse, P., A sparse pls for variable selection when integrating omics data, Stat. Appl. Genet. Mol. Biol., 7, 1, 35 (2008) · Zbl 1276.62061
[9] Liquet, B.; de Micheaux, PL; Hejblum, BP; Thiébaut, R., Group and sparse group partial least square approaches applied in genomics context, Bioinformatics, 32, 1, 35-42 (2015)
[10] Liquet, B.; Mengersen, K.; Pettitt, AN; Sutton, M., Bayesian variable selection regression of multivariate responses for group data, Bayesian Anal., 12, 4, 1039-1067 (2017) · Zbl 1384.62259
[11] Paaby, AB; Rockman, MV, The many faces of pleiotropy, Trends Genet., 29, 2, 66-73 (2013)
[12] Price, AL; Patterson, NJ; Plenge, RM; Weinblatt, ME; Shadick, NA; Reich, D., Principal components analysis corrects for stratification in genome-wide association studies, Nat. Genet., 38, 8, 904 (2006)
[13] Rohart, F.; Eslami, A.; Matigian, N.; Bougeard, S.; Le Cao, K-A, Mint: a multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms, BMC Bioinform., 18, 1, 128 (2017)
[14] Seoane, JA; Campbell, C.; Day, INM; Casas, JP; Gaunt, TR, Canonical correlation analysis for gene-based pleiotropy discovery, PLoS Comput. Biol., 10, 10, e1003876 (2014)
[15] Shen, H.; Huang, JZ, Sparse principal component analysis via regularized low rank matrix approximation, J. Multivar. Anal., 99, 6, 1015-1034 (2008) · Zbl 1141.62049
[16] Simon, N.; Friedman, J.; Hastie, T.; Tibshirani, R., A sparse-group lasso, J. Comput. Graph. Stat., 22, 2, 231-245 (2013)
[17] Subramanian, A.; Tamayo, P.; Mootha, VK; Mukherjee, S.; Ebert, BL; Gillette, MA; Paulovich, A.; Pomeroy, SL; Golub, TR; Lander, ES, Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles, Proc. Natl. Acad. Sci., 102, 43, 15545-15550 (2005)
[18] Sutton, M.; Thiébaut, R.; Liquet, B.: Sparse partial least squares with group and subgroup structure. Stat. Med. (2018) 37(23), 3338-3356
[19] Tenenhaus, A.; Philippe, C.; Guillemot, V.; Le Cao, K-A; Grill, J.; Frouin, V., Variable selection for generalized canonical correlation analysis, Biostatistics, 15, 3, 569-583 (2014)
[20] Vinzi ,VE; Trinchera, L; Amato, S: Pls path modeling from foundations to recent developments and open issues for model assessment and improvement. In: Handbook of Partial Least Squares, pp. 47-82. Springer (2010)
[21] Walker, S.J.: Big data: a revolution that will transform how we live, work, and think. Int. J. Advert. 33(1), 181-183 (2014)
[22] Wang, T.; Ho, G.; Ye, K.; Strickler, H.; Elston, RC, A partial least-square approach for modeling gene-gene and gene-environment interactions when multiple markers are genotyped, Genet. Epidemiol., 33, 1, 6-15 (2009)
[23] Yuan, M.; Lin, Y., Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B (Stat. Methodol.), 68, 1, 49-67 (2006) · Zbl 1141.62030
[24] Zou, H.; Hastie, T.; Tibshirani, R., Sparse principal component analysis, J. Comput. Graph. Stat., 15, 2, 265-286 (2006)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.