×

Block basis factorization for scalable kernel evaluation. (English) Zbl 1453.65091

Summary: Kernel methods are widespread in machine learning; however, they are limited by the quadratic complexity of the construction, application, and storage of kernel matrices. Low-rank matrix approximation algorithms are widely used to address this problem and reduce the arithmetic and storage cost. However, we observed that for some datasets with wide intraclass variability, the optimal kernel parameter for smaller classes yields a matrix that is less well-approximated by low-rank methods. In this paper, we propose an efficient structured low-rank approximation method – the block basis factorization (BBF) – and its fast construction algorithm to approximate radial basis function kernel matrices. Our approach has linear memory cost and floating point operations for many machine learning kernels. BBF works for a wide range of kernel bandwidth parameters and extends the domain of applicability of low-rank approximation methods significantly. Our empirical results demonstrate the stability and superiority over the state-of-the-art kernel approximation algorithms.

MSC:

65F55 Numerical methods for low-rank matrix approximation; matrix compression
65D12 Numerical radial basis function approximation
68T05 Learning and adaptive systems in artificial intelligence

Software:

LIBSVM; UCI-ml

References:

[1] A. E. Alaoui and M. W. Mahoney, Fast randomized kernel ridge regression with statistical guarantees, in Proceedings of the 28th International Conference on Neural Information Processing Systems–Volume 1, NIPS’15, MIT Press, Cambridge, MA, 2015, pp. 775-783, http://dl.acm.org/citation.cfm?id=2969239.2969326.
[2] P. Amestoy, C. Ashcraft, O. Boiteau, A. Buttari, J. L’Excellent, and C. Weisbecker, Improving multifrontal methods by means of block low-rank representations, SIAM J. Sci. Comput., 37 (2015), pp. A1451-A1474, https://doi.org/10.1137/120903476. · Zbl 1314.05111
[3] P. Amestoy, C. Ashcraft, O. Boiteau, A. Buttari, J.-Y. L’Excellent, and C. Weisbecker, Improving multifrontal methods by means of block low-rank representations, SIAM J. Sci. Comput., 37 (2015), pp. A1451-A1474. · Zbl 1314.05111
[4] A. Aminfar, S. Ambikasaran, and E. Darve, A fast block low-rank dense solver with applications to finite-element matrices, J. Comput. Phys., 304 (2016), pp. 170-188. · Zbl 1349.65595
[5] F. Bach, Sharp analysis of low-rank kernel matrix approximations, in Proceedings of the 26th Annual Conference on Learning Theory, 2013, pp. 185-209.
[6] K. Bache and M. Lichman, UCI Machine Learning Repository, http://archive.ics.uci.edu/ml (2013).
[7] J. Bouvrie and B. Hamzi, Kernel methods for the approximation of nonlinear systems, SIAM J. Control Optim., 55 (2017), pp. 2460-2492, https://doi.org/10.1137/14096815X. · Zbl 1368.93248
[8] S. Chandrasekaran, M. Gu, and T. Pals, A fast ULV decomposition solver for hierarchically semiseparable representations, SIAM J. Matrix Anal. Appl., 28 (2006), pp. 603-622. · Zbl 1120.65031
[9] C.-C. Chang and C.-J. Lin, LIBSVM: A library for support vector machines, ACM Trans. Intell. Syst. Technol., 2 (2011), 27. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm/faq.html.
[10] J. Chen, H. Avron, and V. Sindhwani, Hierarchically compositional kernels for scalable nonparametric learning, J. Mach. Learn. Res., 18 (2017), pp. 2214-2255, https://doi.org/10.5555/3122009.3176810. · Zbl 1434.68401
[11] H. Cheng, Z. Gimbutas, P.-G. Martinsson, and V. Rokhlin, On the compression of low rank matrices, SIAM J. Sci. Comput., 26 (2005), pp. 1389-1404, https://doi.org/10.1137/030602678. · Zbl 1083.65042
[12] E. Darve, The fast multipole method I: Error analysis and asymptotic complexity, SIAM J. Numer. Anal., 38 (2000), pp. 98-128. · Zbl 0974.65033
[13] E. Darve, The fast multipole method: Numerical implementation, J. Comput. Phys., 160 (2000), pp. 195-240. · Zbl 0974.78012
[14] P. Drineas, M. Magdon-Ismail, M. W. Mahoney, and D. P. Woodruff, Fast approximation of matrix coherence and statistical leverage, J. Mach. Learn. Res., 13 (2012), pp. 3475-3506. · Zbl 1437.65030
[15] P. Drineas and M. W. Mahoney, On the Nyström method for approximating a gram matrix for improved kernel-based learning, J. Mach. Learn. Res., 6 (2005), pp. 2153-2175. · Zbl 1222.68186
[16] B. Engquist, L. Ying, et al., A fast directional algorithm for high frequency acoustic scattering in two dimensions, Commun. Math. Sci., 7 (2009), pp. 327-345. · Zbl 1182.65178
[17] W. Fong and E. Darve, The black-box fast multipole method, J. Comput. Phys., 228 (2009), pp. 8712-8725. · Zbl 1177.65009
[18] A. Gittens and M. W. Mahoney, Revisiting the Nyström method for improved large-scale machine learning, J. Mach. Learn. Res., 17 (2016), pp. 3977-4041. · Zbl 1367.68223
[19] G. H. Golub and C. F. Van Loan, Matrix Computations, Vol. 3, John Hopkins University Press, Baltimore, MD, 2012. · Zbl 0865.65009
[20] L. Greengard and V. Rokhlin, A fast algorithm for particle simulations, J. Comput. Phys., 73 (1987), pp. 325-348. · Zbl 0629.65005
[21] L. Greengard and V. Rokhlin, A new version of the fast multipole method for the Laplace equation in three dimensions, Acta Numer., 6 (1997), pp. 229-269. · Zbl 0889.65115
[22] L. Greengard and J. Strain, The fast Gauss transform, SIAM J. Sci. Stat. Comput., 12 (1991), pp. 79-94, https://doi.org/10.1137/0912004. · Zbl 0721.65089
[23] M. Gu and S. C. Eisenstat, Efficient algorithms for computing a strong rank-revealing QR factorization, SIAM J. Sci. Comput., 17 (1996), pp. 848-869. · Zbl 0858.65044
[24] W. Hackbusch, A sparse matrix arithmetic based on \(\mathcal{H} \)-matrices. Part I: Introduction to \(\mathcal{H} \)-matrices, Computing, 62 (1999), pp. 89-108. · Zbl 0927.65063
[25] W. Hackbusch and S. Börm, Data-sparse approximation by adaptive \(\mathcal{H}^2\) matrices, Computing, 69 (2002), pp. 1-35. · Zbl 1012.65023
[26] W. Hackbusch and B. N. Khoromskij, A sparse \(\mathcal{H} \)-matrix arithmetic, Computing, 64 (2000), pp. 21-47. · Zbl 0962.65029
[27] N. Halko, P.-G. Martinsson, and J. A. Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM Rev., 53 (2011), pp. 217-288. · Zbl 1269.65043
[28] M. R. Hestenes and E. Stiefel, Methods of Conjugate Gradients for Solving Linear Systems, Vol. 49, NBS, Washington, DC, 1952. · Zbl 0048.09901
[29] Y. Li, H. Yang, E. R. Martin, K. L. Ho, and L. Ying, Butterfly factorization, Multiscale Model. Simul., 13 (2015), pp. 714-732. · Zbl 1317.44004
[30] Y. Li, H. Yang, and L. Ying, Multidimensional butterfly factorization, Appl. Comput. Harmon. Anal., 44 (2018), pp. 737-758, https://doi.org/10.1016/j.acha.2017.04.002. · Zbl 1390.44016
[31] E. Liberty, F. Woolfe, P.-G. Martinsson, V. Rokhlin, and M. Tygert, Randomized algorithms for the low-rank approximation of matrices, Proc. Natl. Acad. Sci., 104 (2007), pp. 20167-20172. · Zbl 1215.65080
[32] M. W. Mahoney, Randomized algorithms for matrices and data, Found. Trends Mach. Learn., 3 (2011), pp. 123-224. · Zbl 1232.68173
[33] C. C. Paige and M. A. Saunders, Solution of sparse indefinite systems of linear equations, SIAM J. Numer. Anal., 12 (1975), pp. 617-629. · Zbl 0319.65025
[34] T. Sarlos, Improved approximation algorithms for large matrices via random projections, in 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), IEEE, New York, 2006, pp. 143-152.
[35] B. Savas, I. S. Dhillon, et al., Clustered low rank approximation of graphs in information science applications, in SIAM International Conference on Data Mining, (SDM), SIAM, Philadelphia, 2011, pp. 164-175.
[36] S. Si, C.-J. Hsieh, and I. S. Dhillon, Memory efficient kernel approximation, J. Mach. Learn., 18 (2017), pp. 682-713. · Zbl 1433.68399
[37] M. L. Stein, Limitations on low rank approximations for covariance matrices of spatial data, Spat. Stat., 8 (2014), pp. 1-19, https://doi.org/10.1016/j.spasta.2013.06.003.
[38] M. L. Stein, Limitations on low rank approximations for covariance matrices of spatial data, Spat. Stat., 8 (2014), pp. 1-19.
[39] R. Wang, Y. Li, and E. Darve, On the numerical rank of radial basis function kernels in high dimensions, SIAM J. Matrix Anal. Appl., 39 (2018), pp. 1810-1835, https://doi.org/10.1137/17M1135803. · Zbl 1455.65068
[40] J. Xia, S. Chandrasekaran, M. Gu, and X. S. Li, Fast algorithms for hierarchically semiseparable matrices, Numer. Linear Algebra Appl., 17 (2010), pp. 953-976. · Zbl 1240.65087
[41] C. Yang, R. Duraiswami, N. A. Gumerov, and L. Davis, Improved fast Gauss transform and efficient kernel density estimation, in Proceedings of the Ninth IEEE International Conference on Computer Vision, 2003, IEEE, New York, 2003, pp. 664-671.
[42] L. Ying, G. Biros, and D. Zorin, A kernel-independent adaptive fast multipole algorithm in two and three dimensions, J. Comput. Phys., 196 (2004), pp. 591-626. · Zbl 1053.65095
[43] J. Zhang, A. May, T. Dao, and C. Ré, Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation, http://arxiv.org/abs/1811.00155, 2019.
[44] K. Zhang and J. T. Kwok, Clustered Nyström method for large scale manifold learning and dimension reduction, IEEE Trans. Neural Netw., 21 (2010), pp. 1576-1587.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.