Abstract
Principal component analysis (PCA) is a well-established dimensionality reduction method commonly used to denoise and visualise data. A classical PCA model is the fixed effect model in which data are generated as a fixed structure of low rank corrupted by noise. Under this model, PCA does not provide the best recovery of the underlying signal in terms of mean squared error. Following the same principle as in ridge regression, we suggest a regularised version of PCA that essentially selects a certain number of dimensions and shrinks the corresponding singular values. Each singular value is multiplied by a term which can be seen as the ratio of the signal variance over the total variance of the associated dimension. The regularised term is analytically derived using asymptotic results and can also be justified from a Bayesian treatment of the model. Regularised PCA provides promising results in terms of the recovery of the true signal and the graphical outputs in comparison with classical PCA and with a soft thresholding estimation strategy. The distinction between PCA and regularised PCA becomes especially important in the case of very noisy data.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bartholomew, D.: Latent Variable Models and Factor Analysis. Charles Griffin and Company Limited, London (1987)
Candès, E.J., Tao, T.: The power of convex relaxation: near-optimal matrix completion. IEEE Trans. Inf. Theory 56(5), 2053–2080 (2009)
Candès, E.J., Sing-Long, C.A., Trzasko, J.D.: Unbiased risk estimates for singular value thresholding and spectral estimators. IEEE Trans. Signal Process. 61(19), 4643–4657 (2013)
Caussinus, H.: Models and Uses of Principal Component Analysis (with Discussion) pp. 149–178. DSWO Press, Leiden (1986)
Chikuse, Y.: Statistics on Special Manifolds. Springer, Berlin (2003)
Cornelius, P., Crossa, J.: Prediction assessment of shrinkage estimators of multiplicative models for multi-environment cultivar trials. Crop Sci. 39, 998–1009 (1999)
Denis, J.B., Gower, J.C.: Asymptotic covariances for the parameters of biadditive models. Util. Math. 193–205 (1994)
Denis, J.B., Gower, J.C.: Asymptotic confidence regions for biadditive models: interpreting genotype-environment interactions. J. R. Stat. Soc., Ser. C, Appl. Stat. 45, 479–493 (1996)
Denis, J.B., Pázman, A.: Bias of least squares estimators in nonlinear regression models with constraints. Part ii: biadditive models. Appl. Math. 44, 359–374 (1999)
Désert, C., Duclos, M., Blavy, P., Lecerf, F., Moreews, F., Klopp, C., Aubry, M., Herault, F., Le Roy, P., Berri, C., Douaire, M., Diot, C., Lagarrigue, S.: Transcriptome profiling of the feeding-to-fasting transition in chicken liver. BMC Genomics (2008)
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95(25), 863–868 (1998)
Gower, J.C., Dijksterhuis, G.B.: Procrustes Problems. Oxford University Press, London (2004)
Greenacre, M.J.: Biplots in practice. In: BBVA Fundation (2010)
Hastie, T.J., Tibshirani, R.J., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, Berlin (2009)
Hoff, P.D.: Model averaging and dimension selection for the singular value decomposition. J. Am. Stat. Assoc. 102(478), 674–685 (2007)
Hoff, P.D.: Simulation of the matrix Bingham–von Mises–Fisher distribution, with applications to multivariate and relational data. J. Comput. Graph. Stat. 18(2), 438–456 (2009)
Husson, F., Le, S., Pages, J.: Exploratory Multivariate Analysis by Example Using R, 1st edn. CRC Press, Boca Raton (2010)
Hwang, H., Tomiuk, M., Takane, Y.: In: Correspondence Analysis, Multiple Correspondence Analysis and Recent Developments, Sage Publications, pp. 243–263 (2009)
Jolliffe, I.: In: Principal Component Analysis. Springer Series in Statistics (2002)
Josse, J., Husson, F.: Selecting the number of components in pca using cross-validation approximations. Comput. Stat. Data Anal. 56, 1869–1879 (2011)
Mazumder, R., Hastie, T., Tibshirani, R.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 99, 2287–2322 (2010)
Papadopoulo, T., Lourakis, M.I.A.: Estimating the Jacobian of the singular value decomposition: theory and applications. In: Proceedings of the European Conference on Computer Vision, ECCV00, pp. 554–570. Springer, Berlin (2000)
R Core Team: R: a language and environment for statistical computing. In: R Foundation for Statistical Computing, Vienna, Austria (2012). http://www.R-project.org/, ISBN 3-900051-07-0
Robinson, G.K.: That BLUP is a good thing: the estimation of random effects. Stat. Sci. 6(1), 15–32 (1991)
Roweis, S.: Em algorithms for pca and spca. In: Advances in Neural Information Processing Systems, pp. 626–632. MIT Press, Cambridge (1998)
Rubin, D.B., Thayer, D.T.: EM algorithms for ML factor analysis. Psychometrika 47(1), 69–76 (1982)
Sharif, B., Bresler, Y.: Physiologically improved NCAT phantom (PINCAT) enables in-silico study of the effects of beat-to-beat variability on cardiac MR. In: Proceedings of the Annual Meeting of ISMRM, Berlin, p. 3418 (2007)
Takane, Y., Hwang, H.: Regularized Multiple Correspondence Analysis pp. 259–279. Chapman & Hall, Boca Raton (2006)
Tipping, M., Bishop, C.: Probabilistic principal component analysis. J. R. Stat. Soc. B 61, 611–622 (1999)
Witten, D., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10, 515–534 (2009)
Witten, D., Tibshirani, R., Gross, S., Narasimhan, B.: PMA: Penalized Multivariate Analysis (2011). http://CRAN.R-project.org/package=PMA, R package version 1.0.8
Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Verbanck, M., Josse, J. & Husson, F. Regularised PCA to denoise and visualise data. Stat Comput 25, 471–486 (2015). https://doi.org/10.1007/s11222-013-9444-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-013-9444-y