. 2009 Jul;10(3):515-34.

doi: 10.1093/biostatistics/kxp008. Epub 2009 Apr 17.

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis

Daniela M Witten¹, Robert Tibshirani, Trevor Hastie

Affiliations

PMID: 19377034
PMCID: PMC2697346
DOI: 10.1093/biostatistics/kxp008

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis

Daniela M Witten et al. Biostatistics. 2009 Jul.

. 2009 Jul;10(3):515-34.

doi: 10.1093/biostatistics/kxp008. Epub 2009 Apr 17.

Authors

Daniela M Witten¹, Robert Tibshirani, Trevor Hastie

Affiliation

¹ Department of Statistics, Stanford University, Stanford, CA 94305, USA. dwitten@stanford.edu

PMID: 19377034
PMCID: PMC2697346
DOI: 10.1093/biostatistics/kxp008

Abstract

We present a penalized matrix decomposition (PMD), a new framework for computing a rank-K approximation for a matrix. We approximate the matrix X as circumflexX = sigma(k=1)(K) d(k)u(k)v(k)(T), where d(k), u(k), and v(k) minimize the squared Frobenius norm of X - circumflexX, subject to penalties on u(k) and v(k). This results in a regularized version of the singular value decomposition. Of particular interest is the use of L(1)-penalties on u(k) and v(k), which yields a decomposition of X using sparse vectors. We show that when the PMD is applied using an L(1)-penalty on v(k) but not on u(k), a method for sparse principal components results. In fact, this yields an efficient algorithm for the "SCoTLASS" proposal (Jolliffe and others 2003) for obtaining sparse principal components. This method is demonstrated on a publicly available gene expression data set. We also establish connections between the SCoTLASS method for sparse principal component analysis and the method of Zou and others (2006). In addition, we show that when the PMD is applied to a cross-products matrix, it results in a method for penalized canonical correlation analysis (CCA). We apply this penalized CCA method to simulated data and to a genomic data set consisting of gene expression and DNA copy number measurements on the same set of samples.

PubMed Disclaimer

Figures

**Fig. 1.**
A graphical representation of the L₁- and L₂-constraints on u in the PMD(L₁, L₁) criterion. The constraints are as follows: ‖u‖₂² ≤ 1 and ‖u‖₁ ≤ c. Here, u is two-dimensional, and the grey lines indicate the coordinate axes u₁ and u₂. Left: the L₂-constraint is the solid circle. For both the L₁- and L₂-constraints to be active, c must be between 1 and . The constraints ‖u‖₁ = 1 and ‖u‖₁= are shown using dashed lines. Right: The L₂- and L₁-constraints on u are shown for some c between 1 and . Small circles indicate the points where both the L₁- and the L₂-constraints are active. The solid arcs indicate the solutions that occur when Δ₁ = 0 in Algorithm 3. The figure shows that in 2D, the points where both the L₁- and L₂-constraints are active do not have either u₁ or u₂ equal to 0.

formula image — **Fig. 1.**
A graphical representation of the L₁- and L₂-constraints on u in the PMD(L₁, L₁) criterion. The constraints are as follows: ‖u‖₂² ≤ 1 and ‖u‖₁ ≤ c. Here, u is two-dimensional, and the grey lines indicate the coordinate axes u₁ and u₂. Left: the L₂-constraint is the solid circle. For both the L₁- and L₂-constraints to be active, c must be between 1 and . The constraints ‖u‖₁ = 1 and ‖u‖₁= are shown using dashed lines. Right: The L₂- and L₁-constraints on u are shown for some c between 1 and . Small circles indicate the points where both the L₁- and the L₂-constraints are active. The solid arcs indicate the solutions that occur when Δ₁ = 0 in Algorithm 3. The figure shows that in 2D, the points where both the L₁- and L₂-constraints are active do not have either u₁ or u₂ equal to 0.

**Fig. 2.**
Simulated CGH data. Top: results of PMD(L₁, FL); middle: results of PMD(L₁, L₁); bottom: generative model. PMD(L₁, FL) successfully identifies both the region of gain and the subset of samples for which that region is present.

**Fig. 3.**
Breast cancer gene expression data: a greater proportion of variance is explained when SPC is used to obtain the sparse principal components, rather than SPCA. Multiple SPC components were obtained as described in Algorithm 2.

**Fig. 4.**
The efficacy of PMD(L₁, L₁) is demonstrated using a simulation in which Xis generated from 2 sparse latent factors, called u₁and u₂, and Yis generated from 2 sparse latent factors, called v₁and v₂. The PMD(L₁, L₁) method identifies linear combinations of these sparse factors. Details of the simulation are given in Section A. of the Appendix.

**Fig. 5.**
PMD(L₁, FL) was performed for the breast cancer data set. Left: for each chromosome, the weights of vobtained using PMD(L₁, FL) are shown. All the vweights shown are positive, but the results would not be affected by flipping the signs of both vand u. On chromosome 2, vhas no nonzero elements. Right: for each chromosome, uand vwere computed on a training set consisting of 3/4 of the samples, and cor(Xu,Yv)is plotted, where Xand Yare the training (dashed) and test (solid) data.

See this image and copyright information in PMC

Cited by

Contrastive Functional Connectivity Defines Neurophysiology-informed Symptom Dimensions in Major Depression.
Zhu H, Tong X, Carlisle NB, Xie H, Keller CJ, Oathes DJ, Nemeroff CB, Fonzo GA, Zhang Y. Zhu H, et al. bioRxiv [Preprint]. 2024 Oct 7:2024.10.04.616707. doi: 10.1101/2024.10.04.616707. bioRxiv. 2024. PMID: 39416217 Free PMC article. Preprint.
Transparency and precision in the age of AI: evaluation of explainability-enhanced recommendation systems.
Govea J, Gutierrez R, Villegas-Ch W. Govea J, et al. Front Artif Intell. 2024 Sep 5;7:1410790. doi: 10.3389/frai.2024.1410790. eCollection 2024. Front Artif Intell. 2024. PMID: 39301478 Free PMC article.
Protocol to infer and analyze miRNA sponge modules in heterogeneous data using miRSM 2.0.
Zhang J, Wei X, Zhao C, Yang H. Zhang J, et al. STAR Protoc. 2024 Sep 17;5(4):103317. doi: 10.1016/j.xpro.2024.103317. Online ahead of print. STAR Protoc. 2024. PMID: 39292559 Free PMC article.
Stable biomarker discovery in multi-omics data via canonical correlation analysis.
Pusa T, Rousu J. Pusa T, et al. PLoS One. 2024 Sep 9;19(9):e0309921. doi: 10.1371/journal.pone.0309921. eCollection 2024. PLoS One. 2024. PMID: 39250478 Free PMC article.
Limited generalizability of multivariate brain-based dimensions of child psychiatric symptoms.
Xu B, Dall'Aglio L, Flournoy J, Bortsova G, Tervo-Clemmens B, Collins P, de Bruijne M, Luciana M, Marquand A, Wang H, Tiemeier H, Muetzel RL. Xu B, et al. Commun Psychol. 2024 Feb 28;2(1):16. doi: 10.1038/s44271-024-00063-y. Commun Psychol. 2024. PMID: 39242757 Free PMC article.

See all "Cited by" articles

References

1. Boyd S, Vandenberghe L. Convex Optimization. New York: Cambridge University Press; 2004.
1. Chin K, DeVries S, Fridlyand J, Spellman P, Roydasgupta R, Kuo W-L, Lapuk A, Neve R, Qian Z, Ryder T. Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell. 2006;10:529–541. and others. - PubMed
1. Dudoit S, Fridlyand J, Speed T. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association. 2001;96:1151–1160.
1. Eckart C, Young G. The approximation of one matrix by another of low rank. Psychometrika. 1936;1:211.
1. Friedman J, Hastie T, Hoefling H, Tibshirani R. Pathwise coordinate optimization. Annals of Applied Statistics. 2007;1:302–332.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis

Affiliation

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources