×

Sparse partial least squares regression for on-line variable selection with multivariate data streams. (English) Zbl 07260241

Summary: Data streams arise in several domains. For instance, in computational finance, several statistical applications revolve around the real-time discovery of associations between a very large number of co-evolving data feeds representing asset prices. The problem we tackle in this paper consists of learning a linear regression function from multivariate input and output streaming data in an incremental fashion while also performing dimensionality reduction and variable selection. When input and output streams are high-dimensional and correlated, it is plausible to assume the existence of hidden factors that explain a large proportion of the covariance between them. The methods we propose build on recursive partial least squares (PLS) regression. The hidden factors are dynamically inferred and tracked over time and, within each factor, the most important streams are recursively identified by means of sparse matrix decompositions. Moreover, the recursive regression model is able to adapt to sudden changes in the data generating mechanism and also identifies the number of latent factors. Extensive simulation results illustrate how the methods perform and compare with alternative penalized regression models for streaming data. We also apply the algorithm to solve a multivariate version of the enhanced index tracking problem in computational finance.

MSC:

62-XX Statistics
68-XX Computer science

Software:

PMA; mixOmics
Full Text: DOI

References:

[1] O. Nasraoui, C. Rojas, and C. Cardona, A framework for mining evolving trends in web data streams using dynamic learning and retrospective validation, J Comput Network, 50(10) (2006), 1425-1652. (Special Issue on Web Dynamics).
[2] S.-K. Ng, G. J. McLachlan, and A. H Lee, An incremental em-based learning approach for on-line prediction of hospital resource utilization, Artif Intell Med 36 (2006), 257-267.
[3] Y. Zhu and D. Shasha, Statstream: statistical monitoring of thousands of data streams in real time, In Proceedings of the 28th VLDB Conference, Hong Kong, China, 2002.
[4] R Tibshirani, Regression shrinkage and selection via the lasso, J Roy Stat Soc B, 58(1) (1996), 267-288. · Zbl 0850.62538
[5] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, Least angle regression, Ann Stat, 32 (2004), 407-499. · Zbl 1091.62054
[6] J. Friedman, E. Hastie, H. H¨ofling, and R. Tibshirani, Pathwise coordinate optimization, Ann Appl Stat, 1(2) (2007), 302-332. · Zbl 1378.90064
[7] V. DeMiguel, L. Garlappi, F. J. Nogales, and R. Uppal, A generalized approach to portfolio optimization: Improving performance by constraining portfolio norms, Manage Sci, 55(5) (2007), 798-812. · Zbl 1232.91617
[8] J. Brodie, I. Daubechies, C. De Mol, C. Giannone, and I. Loris, Sparse and stable Markowitz portfolios. European Central Bank Working Paper Series 936, 2008. · Zbl 1203.91271
[9] H. Wang, W. Fan, P. S. Yu, and J. Han, Mining concept-drifting data streams using ensemble classifiers, In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, Washington, DC, 2003, 226-235.
[10] J. Knight and S. Satchell, eds. Linear Factor Models in Finance, Oxford, UK, A Butterworth-Heinemann Title, 2004.
[11] J. Weng, Y. Zhang, and W. S. Hwang, Candid covariancefree incremental principal component analysis, IEEE Trans Pattern Anal Machine Intell, 25(8) (2003), 1034-1040.
[12] S. Papadimitriou, J. Sun, and C. Faloutsos, Streaming pattern discovery in multiple time-series, In Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, 2005, 697-708.
[13] B. Yang, Projection approximation subspace tracking, IEEE Trans Signal Process, 43 (1995), 95-107.
[14] A. Shu-Yan Wong, K. Wong, and C. Leung, A practical sequential method for principal component analysis, Neural Process Lett 11 (2000), 107-112. · Zbl 1147.68659
[15] Z. Wang, Y. Lee, S. Fiori, C. S. Leung, and Y.-S. Zhu, An improved sequential method for principal component analysis, Pattern Recognit Lett, 24 (2003), 1409-1415. · Zbl 1048.68093
[16] B. S. Dayal and J. F. Macgregor, Improved PLS algorithms, J Chemom, 11(1) (1997), 73-85.
[17] S. Vijayakumar, A. D’Souza, and S. Schaal, Incremental online learning in high dimensions, Neural Comput, 17 (2005), 2602-2634.
[18] S. Haykin, Adaptive Filter Theory, Upper Saddle River, New Jersey, Prentice Hall, 2001. · Zbl 0723.93070
[19] S.-P. Kim, Y. N. Rao, D. Erdogmus, and J. C. Principe, Tracking of multivariate time-variant systems based on online variable selection, In 2004 IEEE Workshop on Machine Learning for Signal Processing, Sao Luis, Brazil, 2004, 123-132.
[20] C. Anagnostopoulos, D. Tasoulis, D. J. Hand, and N. M. Adams, Online optimisation for variable selection on data streams, Proceedings of the 18th European Conference on Artificial Intelligence, Patros, Greece, 2008, 132-136.
[21] S Erlich and K Yao. Convergences of adaptive block simultaneous iteration method for eigenstructure decomposition. Signal Process, 37 (1994), 1-13. · Zbl 0810.94008
[22] A. J. Izenman, Regression, classification, and Manifold learning,ModernMultivariateStatisticalTechniques, Springer Texts in Statistics, chapter 6, 2008, 107-190. · Zbl 1155.62040
[23] R. Rosipal and N. Kr¨amer, Overview and recent advances in partial least squares, Subspace, Latent Structure and Feature Selection Techniques, Berlin, Springer, 2006, 34-51.
[24] A. Hoskuldsson, PLS regression methods, J Chemom, 2(3) (1988), 211-228.
[25] H. Wold, Estimation of principal components and related models by iterative least squares, Multivariate Analysis, New York, Academic Press, 1966. · Zbl 0214.46103
[26] J. Wegelin, A survey of partial least squares (PLS) methods, with emphasis on the two-block case, Technical report, University of Washington, 2000.
[27] L. Gidskehaug, H. Stdkilde-Jrgensen, M. Martens, and H. Martens, Bridge-PLS regression: two-block bilinear regression without deflation, J Chemom, 18 (2004), 208-215.
[28] H. Shen and J. Huang, Sparse principal component analysis via regularized low rank matrix approximation, J Multivariate Anal, 99 (2008), 1015-1034. · Zbl 1141.62049
[29] D. M. Witten, R. Tibshirani, and T. Hastie, A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis, Biostatistics 10 (2009), 515-534. · Zbl 1437.62658
[30] K. Lˆe Cao, D. Rossouw, C. Robert-Grani´e, and P. Besse, Sparse PLS: variable selection when integrating omic data, Technical report, INRA, 2008. · Zbl 1276.62061
[31] G. Golub and C. F. Van Loan, Matrix Computations, Baltimore, The Johns Hopkins University Press, 1996. · Zbl 0865.65009
[32] L. Ljung. System Identification: Theory for the User (2nd ed.), Upper Saddle River, New Jersey, Prentice Hall PTR, 1998. · Zbl 0615.93004
[33] C. Paleologu, J. Benesty, and S. Ciochina, A robust variable forgetting factor recursive least-squares algorithm for system identification, IEEE Signal Process Lett, 15 (2008), 597-600.
[34] J. Sherman and W. J. Morrison, Adjustment of an inverse matrix corresponding to a change in one element of a given matrix, Ann Math Stat, 21 (1950), 124-127. · Zbl 0037.00901
[35] S.-H. Leung and C.F. So, Gradient-based variable forgetting factor rls algorithm in time-varying environments, IEEE Trans Signal Process, 53(8) (2005), 3141-3150. · Zbl 1373.62465
[36] M. Basseville and I.V. Nikiforov, Detection of abrupt changes: theory and application, New Jersey, Prentice-Hall, 1993. · Zbl 1407.62012
[37] E. Bair, T. Hastie, D. Paul, and R. Tibshirani, Prediction by supervised principal components, J Amer Stat Assoc, 101 (2006), 119-137. · Zbl 1118.62326
[38] J. Miao and C. L. Dunis, Volatility filters for dynamic portfolio optimization, Appl Financial Econ Lett, 1(2) (2005), 111-119.
[39] J.E. Beasley, N. Meade, and T. J. Chang, An evolutionary heuristic for the index tracking problem, Eur J Oper Res, 148 (2003), 621-643. · Zbl 1037.90038
[40] M. Gilli and E. K¨ellezi, Threshold accepting for index tracking, Technical report, Department of Econometrics, University of Geneva, Switzerland, 2001. · Zbl 1029.91514
[41] C. Alexander and A. Dimitriu, Sources of over-performance in equity markets: mean reversion, common trends and herding, Technical report, ISMA Center, University of Reading, UK, 2005.
[42] C. Alexander and A. Dimitriu, Equity indexing: optimising passive investments. Quant Finance 4(3) (2004), 30-33.
[43] A. Rudd, Optimal selection of passive portfolios, Financial Manage, 1 (1980), 57-66.
[44] R. A. Haugen and N.L. Baker, Dedicated stock portfolios, J Portfolio Manage, 16(4) (1990), 17-22.
[45] C. Alexander and A. Dimitriu, The cointegration alpha: enhanced index tracking and long-short equity market neutral strategies. Social Science Research Network Working Paper Series, June 2002.
[46] M. Brand, Fast online svd revisions for lightweight recommender systems, Technical report, Mitsubishi Electric Research Laboratory, 2003.
[47] M. Shen, D. Zhu, and Z. Zhu, Reduced-rank spacetime adaptive processing using a modified projection approximation subspace tracking deflation approach, IET Radar, Sonar Navigation, 3(1) (2008), 93-100.
[48] A.ValizadehandMahmoodKarimi,Fastsubspace tracking algorithm based on the constrained projection approximation, EURASIP J Adv Signal Process (2009). · Zbl 1184.94188
[49] S.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.