×

A class of smooth exact penalty function methods for optimization problems with orthogonality constraints. (English) Zbl 1505.65170

Summary: Updating the augmented Lagrangian multiplier by closed-form expression yields efficient first-order infeasible approach for optimization problems with orthogonality constraints. Hence, parallelization becomes tractable in solving this type of problems. Inspired by this closed-form updating scheme, we propose a novel penalty function with compact convex constraints (PenC). We show that PenC can act as an exact penalty model which shares the same global minimizers as the original problem with orthogonality constraints. Based on PenC, we first propose a first-order algorithm called PenCF and establish its global convergence and local linear convergence rate under some mild assumptions. For the case that the computation and storage of Hessian is achievable, and we pursue high precision solution and fast local convergence rate, a second-order approach called PenCS is proposed for solving PenC. To avoid expensive calculation or solving a hard subproblem in computing the Newton step, we propose a new strategy to do it approximately which still leads to quadratic convergence locally. Moreover, the main iterations of both PenCF and PenCS are orthonormalization-free and hence parallelizable. Numerical experiments illustrate that PenCF is comparable with the existing first-order methods. Furthermore, PenCS shows its stability and high efficiency in obtaining high precision solution comparing with the existing second-order methods.

MSC:

65F15 Numerical computation of eigenvalues and eigenvectors of matrices
65K05 Numerical mathematical programming methods
90C06 Large-scale problems in mathematical programming

Software:

KSSOLV; ARock; Manopt; HOGWILD
Full Text: DOI

References:

[1] Abrudan, T. E.; Eriksson, J.; Koivunen, V., Steepest descent algorithms for optimization under unitary matrix constraint, IEEE. Trans. Signal. Process., 56, 3, 1134-1147 (2008) · Zbl 1390.90510
[2] Abrudan, T.; Eriksson, J.; Koivunen, V., Conjugate gradient algorithm for optimization under unitary matrix constraint, Signal. Processing., 89, 9, 1704-1714 (2009) · Zbl 1178.94048
[3] Absil, P.-A.; Baker, C. G.; Gallivan, K. A., Trust-region methods on Riemannian manifolds, Found. Comput. Math., 7, 3, 303-330 (2007) · Zbl 1129.65045
[4] Absil, P.-A.; Mahony, R.; Sepulchre, R., Optimization Algorithms on Matrix Manifolds (2009), Princeton University Press: Princeton University Press, Princeton, NJ · Zbl 1147.65043
[5] Bansal, N., Chen, X., and Wang, Z., Can we gain more from orthogonality regularizations in training deep networks?, in Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox and R. Garnett, eds., Curran Associates, New York, NY, 2018, pp. 4261-4271.
[6] Bertsekas, D. P., Constrained Optimization and Lagrange Multiplier Methods (2014), Academic Press: Academic Press, Athena Scientific, Belmont, MA
[7] Boumal, N.; Mishra, B.; Absil, P.-A.; Sepulchre, R., Manopt, a matlab toolbox for optimization on manifolds, J. Mach. Learn. Res., 15, 1, 1455-1459 (2014) · Zbl 1319.90003
[8] Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J., Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends \(####\) Mach. Learn., 3, 1, 1-122 (2011) · Zbl 1229.90122
[9] Courant, R., Variational Methods for the Solution of Problems of Equilibrium and Vibrations · Zbl 0810.65100
[10] Dai, X., Zhang, L., and Zhou, A., Adaptive step size strategy for orthogonality constrained line search methods, arXiv preprint arXiv:1906.02883, 2019.
[11] Edelman, A.; Arias, T. A.; Smith, S. T., The geometry of algorithms with orthogonality constraints, J. Matrix. Anal. Appl., 20, 2, 303-353 (1998) · Zbl 0928.65050
[12] Gao, B.; Liu, X.; Chen, X.; Yuan, Y.-X., A new first-order algorithmic framework for optimization problems with orthogonality constraints, SIAM. J. Optim., 28, 1, 302-332 (2018) · Zbl 1382.65171
[13] Gao, B.; Liu, X.; Yuan, Y.-X., Parallelizable algorithms for optimization problems with orthogonality constraints, SIAM. J. Sci. Comput., 41, 3, A1949-A1983 (2019) · Zbl 07099320
[14] Han, D. and Kim, J., Unsupervised simultaneous orthogonal basis clustering feature selection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, New York, NY, 2015, pp. 5016-5023.
[15] Hestenes, M. R., Multiplier and gradient methods, J. Optim. Theory. Appl., 4, 5, 303-320 (1969) · Zbl 0174.20705
[16] Hestenes, M. R.; Stiefel, E., Methods of Conjugate Gradients for Solving Linear Systems, Vol. 49 (1952), NBS: NBS, Washington, DC · Zbl 0048.09901
[17] Hu, J.; Milzarek, A.; Wen, Z.; Yuan, Y., Adaptive quadratically regularized newton method for riemannian optimization, SIAM. J. Matrix. Anal. Appl., 39, 3, 1181-1207 (2018) · Zbl 1415.65139
[18] Jiang, B.; Dai, Y.-H., A framework of constraint preserving update schemes for optimization on stiefel manifold, Math. Program., 153, 2, 535-575 (2015) · Zbl 1325.49037
[19] Jolliffe, I. T.; Trendafilov, N. T.; Uddin, M., A modified principal component technique based on the lasso, J. Comput. Graph. Stat., 12, 3, 531-547 (2003)
[20] Kohn, W.; Sham, L. J., Self-consistent equations including exchange and correlation effects, Phys. Rev., 140A, 4, A1133 (1965)
[21] Lai, R.; Osher, S., A splitting method for orthogonality constrained problems, J. Sci. Comput., 58, 2, 431-449 (2014) · Zbl 1296.65087
[22] Li, X.; Zhang, H.; Zhang, R.; Liu, Y.; Nie, F., Generalized uncorrelated regression with adaptive graph for unsupervised feature selection, IEEE. Trans. Neural. Netw. Learn. Syst., 30, 5, 1587-1595 (2018)
[23] Liu, X.; Wang, X.; Wen, Z.; Yuan, Y., On the convergence of the self-consistent field iteration in Kohn-Sham density functional theory, SIAM. J. Matrix. Anal. Appl., 35, 2, 546-558 (2014) · Zbl 1319.65041
[24] Liu, J.; Wright, S. J.; Ré, C.; Bittorf, V.; Sridhar, S., An asynchronous parallel stochastic coordinate descent algorithm, J. Mach. Learn. Res., 16, 1, 285-322 (2015) · Zbl 1337.68286
[25] Liu, X.; Wen, Z.; Zhang, Y., An efficient gauss-newton algorithm for symmetric low-rank product matrix approximations, SIAM. J. Optim., 25, 3, 1571-1608 (2015) · Zbl 1321.65060
[26] Manton, J. H., Optimization algorithms exploiting unitary constraints, IEEE. Trans. Signal. Process., 50, 3, 635-650 (2002) · Zbl 1369.90169
[27] Nishimori, Y.; Akaho, S., Learning algorithms utilizing quasi-geodesic flows on the stiefel manifold, Neurocomputing, 67, 106-135 (2005)
[28] Nocedal, J.; Wright, S. J., Numerical Optimization (1999) · Zbl 0930.65067
[29] Nocedal, J.; Wright, S., Numerical Optimization (2006), Springer: Springer, New York, NY · Zbl 1104.65059
[30] Peng, Z., Yan, M., and Yin, W., Parallel and distributed sparse optimization, in 2013 Asilomar Conference on Signals, Systems and Computers, IEEE, New York, NY, 2013, pp. 659-646.
[31] Peng, Z.; Xu, Y.; Yan, M.; Yin, W., Arock: an algorithmic framework for asynchronous parallel coordinate updates, SIAM. J. Sci. Comput., 38, 5, A2851-A2879 (2016) · Zbl 1350.49041
[32] Powell, M. J.D., A method for nonlinear constraints in minimization problems, Optimization, 5, 6, 283-298 (1969) · Zbl 0194.47701
[33] Powell, M. J.D., A method for nonlinear constraints in minimization problems, Optimization, 0, 283-298 (1969) · Zbl 0194.47701
[34] Recht, B., Re, C., Wright, S., and Niu, F., Hogwild: A lock-free approach to parallelizing stochastic gradient descent, in Advances in Neural Information Processing Systems, IEEE, New York, NY, 2011, pp. 693-701.
[35] Steihaug, T., The conjugate gradient method and trust regions in large scale optimization, SIAM. J. Numer. Anal., 20, 3, 626-637 (1983) · Zbl 0518.65042
[36] Sun, W.; Yuan, Y.-X., Optimization Theory and Methods: Nonlinear Programming, Vol. 1 (2006), Springer: Springer, New York, NY · Zbl 1129.90002
[37] Tang, J. and Liu, H., Unsupervised feature selection for linked social media data, in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, 2012, pp. 904-912.
[38] Theis, F. J.; Cason, T. P.; Absil, P. A., Soft dimension reduction for ica by joint diagonalization on the stiefel manifold, Independent Component Anal. Signal Separat., 5441, 354-361 (2009)
[39] Toint, P., Towards an efficient sparsity exploiting newton method for minimization, in Sparse Matrices and Their Uses, Iain S. Duff, Academic Press, London, 1981, pp. 57-88. · Zbl 0463.65045
[40] Wen, Z.; Yin, W., A feasible method for optimization with orthogonality constraints, Math. Program., 142, 1-2, 397-434 (2013) · Zbl 1281.49030
[41] Wen, Z.; Yang, C.; Liu, X.; Zhang, Y., Trace-penalty minimization for large-scale eigenspace computation, J. Sci. Comput., 66, 3, 1175-1203 (2016) · Zbl 1373.65026
[42] Yang, C.; Meza, J. C.; Wang, L.-W., A constrained optimization algorithm for total energy minimization in electronic structure calculations, J. Comput. Phys., 217, 2, 709-721 (2006) · Zbl 1102.81340
[43] Yang, C.; Meza, J. C.; Wang, L.-W., A trust region direct constrained minimization algorithm for the Kohn-Sham equation, SIAM. J. Sci. Comput., 29, 5, 1854-1875 (2007) · Zbl 1154.65340
[44] Yang, C.; Gao, W.; Meza, J. C., On the convergence of the self-consistent field iteration for a class of nonlinear eigenvalue problems, SIAM. J. Matrix. Anal. Appl., 30, 4, 1773-1788 (2009) · Zbl 1228.65081
[45] Yang, C.; Meza, J. C.; Lee, B.; Wang, L.-W., Kssolv – a matlab toolbox for solving the kohn-sham equations, ACM Trans. Math. Softw. (TOMS), 36, 2, 10 (2009) · Zbl 1364.65112
[46] Yang, Y., Shen, H.T., Ma, Z., Huang, Z., and Zhou, X., \(####\)-norm regularized discriminative feature selection for unsupervised. Twenty-Second International Joint Conference on Artificial Intelligence, 2011.
[47] Yuan, Y., On the truncated conjugate gradient method, Math. Program., 87, 3, 561-573 (2000) · Zbl 0955.65039
[48] Zhang, R.; Nie, F.; Li, X., Feature selection under regularized orthogonal least square regression with optimal scaling, Neurocomputing, 273, 547-553 (2018)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.