×

A unified single-loop alternating gradient projection algorithm for nonconvex-concave and convex-nonconcave minimax problems. (English) Zbl 1522.90261

Summary: Much recent research effort has been directed to the development of efficient algorithms for solving minimax problems with theoretical convergence guarantees due to the relevance of these problems to a few emergent applications. In this paper, we propose a unified single-loop alternating gradient projection (AGP) algorithm for solving smooth nonconvex-(strongly) concave and (strongly) convex-nonconcave minimax problems. AGP employs simple gradient projection steps for updating the primal and dual variables alternatively at each iteration. We show that it can find an \(\varepsilon\)-stationary point of the objective function in \({\mathcal{O}}\left(\varepsilon^{-2} \right)\) (resp. \({\mathcal{O}}\left(\varepsilon^{-4} \right))\) iterations under nonconvex-strongly concave (resp. nonconvex-concave) setting. Moreover, its gradient complexity to obtain an \(\varepsilon\)-stationary point of the objective function is bounded by \({\mathcal{O}}\left(\varepsilon^{-2} \right)\) (resp., \({\mathcal{O}}\left(\varepsilon^{-4} \right))\) under the strongly convex-nonconcave (resp., convex-nonconcave) setting. To the best of our knowledge, this is the first time that a simple and unified single-loop algorithm is developed for solving both nonconvex-(strongly) concave and (strongly) convex-nonconcave minimax problems. Moreover, the complexity results for solving the latter (strongly) convex-nonconcave minimax problems have never been obtained before in the literature. Numerical results show the efficiency of the proposed AGP algorithm. Furthermore, we extend the AGP algorithm by presenting a block alternating proximal gradient (BAPG) algorithm for solving more general multi-block nonsmooth nonconvex-(strongly) concave and (strongly) convex-nonconcave minimax problems. We can similarly establish the gradient complexity of the proposed algorithm under these four different settings.

MSC:

90C47 Minimax problems in mathematical programming
90C26 Nonconvex programming, global optimization
90C30 Nonlinear programming

References:

[1] Abadeh, S.S., Esfahani, P.M.M., Kuhn, D.: Distributionally robust logistic regression. In NeurIPS, pp. 1576-1584 (2015)
[2] Abernethy, J., Lai, K.A., Wibisono, A.: Last-iterate convergence rates for min-max optimization. (2019) ArXiv preprint arXiv:1906.02027
[3] Adolphs, L., Daneshmand, H., Lucchi, A., Hofmann, T.: Local saddle point optimization: A curvature exploitation approach. The 22nd International Conference on Artificial Intelligence and Statistics, PMLR, pp. 486-495 (2019)
[4] Krizhevsky, A.; Sutskever, I.; Hinton, GE, Imagenet classification with deep convolutional neural networks, Adv. Neural. Inf. Process. Syst., 25, 1097-1105 (2012)
[5] Krizhevsky, A, Hinton, G.: Learning multiple layers of features from tiny images (2009)
[6] Bailey, J.; Gidel, G.; Piliouras, G., Finite regret and cycles with fixed step-size via alternating gradient descent-ascent, Proc. Thirty Third Conf. Learn. Theory PMLR, 125, 391-407 (2020)
[7] Boyd, S.; Vandenberghe, L., Convex Optimization (2004), Cambridge: Cambridge university press, Cambridge · Zbl 1058.90049
[8] Chambolle, A.; Pock, T., On the ergodic convergence rates of a first-order primal-dual algorithm, Math. Program., 159, 1-2, 253-287 (2016) · Zbl 1350.49035
[9] Chen, Y.; Lan, G.; Ouyang, Y., Optimal primal-dual methods for a class of saddle point problems, SIAM J. Optim., 24, 4, 1779-1814 (2014) · Zbl 1329.90090
[10] Chen, Y.; Lan, G.; Ouyang, Y., Accelerated schemes for a class of variational inequalities, Math. Program., 165, 1, 113-149 (2017) · Zbl 1386.90102
[11] Daskalakis, C., Ilyas, A., Syrgkanis, V., Zeng, H.: Training gans with optimism. (2017) ArXiv preprint arXiv:1711.00141
[12] Daskalakis, C.; Panageas, I., The limit points of (optimistic) gradient descent in min-max optimization, Adv. Neural. Inf. Process. Syst., 31, 9236-9246 (2018)
[13] Dang, C.D., Lan, G.: Randomized first-Order methods for saddle point optimization. Technical Report, Department of Industrial and Systems Engineering, University of Florida (2015)
[14] Dang, CD; Lan, G., On the convergence properties of non-Euclidean extragradient methods for variational Inequalities with Generalized Monotone Operators, Comput. Optim. Appl., 60, 2, 277-310 (2015) · Zbl 1311.49020
[15] Flokas, L.; Vlatakis-Gkaragkounis, E.; Piliouras, G., Poincaré recurrence, cycles and spurious equilibria in gradient-descent-ascent for non-convex non-concave zero-sum games, Adv. Neural. Inf. Process. Syst., 32, 10450-10461 (2019)
[16] Giannakis, G.B., Ling, Q., Mateos, G., Schizas, I.D., Zhu, H.: Decentralized learning for wireless communications and networking. In Splitting Methods in Communication, Imaging, Science, and Engineering pp. 461-497. Springer, Cham (2016) · Zbl 1420.68168
[17] Gidel, G., Berard, H., Vignoud, G., Vincent, P., Lacoste-Julien, S.: A variational inequality perspective on generative adversarial networks. In International Conference on Learning Representations (2018)
[18] Gidel, G., Hemmat, R.A., Pezeshki, M., Huang, G., Lepriol, R., Lacoste-Julien, S., Mitliagkas, I.: Negative momentum for improved game dynamics. In The 22nd International Conference on Artificial Intelligence and Statistics, PMLR pp. 1802-1811 (2019)
[19] Giordano, R.; Broderick, T.; Jordan, MI, Covariances, robustness, and variational bayes, J. Mach. Learn. Res., 19, 51, 1-49 (2018) · Zbl 1467.62043
[20] He, Y.; Monteiro, RDC, An accelerated hpe-type algorithm for a class of composite convex-concave saddle-point problems, SIAM J. Optim., 26, 1, 29-56 (2016) · Zbl 1329.90179
[21] Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, pp. 6626-6637, (2017)
[22] Ho, J.; Ermon, S., Generative adversarial imitation learning, Neural. Inf. Process. Syst., 29, 4565-4573 (2016)
[23] Hsieh, Y., Liu, C., Cevher, V.: Finding mixed nash equilibria of generative adversarial networks. In International Conference on Machine Learning, PMLR pp. 2810-2819 (2019)
[24] Jin, C., Netrapalli, P., Jordan, M.I.: Minmax optimization: stable limit points of gradient descent ascent are locally optimal. (2019) ArXiv preprint arXiv:1902.00618
[25] Kong, W., Monteiro, R.D.C.: An accelerated inexact proximal point method for solving nonconvex concave min-max problems. (2019) ArXiv preprint arXiv:1905.13433 · Zbl 07421056
[26] Lan, G., First-Order and Stochastic Optimization Methods for Machine Learning (2020), Berlin: Springer-Nature, Berlin · Zbl 1442.68003
[27] Lan, G.; Monteiro, RDC, Iteration-complexity of first-order augmented lagrangian methods for convex programming, Math. Program., 155, 1-2, 511-547 (2016) · Zbl 1348.90654
[28] Letcher, A.; Balduzzi, D.; Racaniere, S.; Martens, J.; Foerster, J.; Tuyls, K.; Graepel, T., Differentiable game mechanics, J. Mach. Learn. Res., 20, 84, 1-40 (2019) · Zbl 1489.91032
[29] Liao, W., Hong, M., Farmanbar, H., Luo, Z.-Q.: Semi-asynchronous routing for large scale hierarchical networks. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2894-2898 (2015)
[30] Lin, Q., Liu, M., Rafique, H., Yang, T.: Solving weakly-convex-weakly-concave saddle-point problems as weakly-monotone variational inequality. (2018) ArXiv preprint arXiv:1810.10207
[31] Lin, T., Jin, C., Jordan, M.: On gradient descent ascent for nonconvex-concave minimax problems. In International Conference on Machine Learning, PMLR pp. 6083-6093 (2020)
[32] Lin, T., Jin, C., Jordan, M.: Near-optimal algorithms for minimax optimization. In Conference on Learning Theory, PMLR pp. 2738-2779, (2020)
[33] Lu, S.; Tsaknakis, I.; Hong, M.; Chen, Y., Hybrid block successive approximation for one-sided nonconvex min-max problems: algorithms and applications, IEEE Trans. Signal Process., 68, 3676-3691 (2020) · Zbl 07590992
[34] Mateos, G.; Bazerque, JA; Giannakis, GB, Distributed sparse linear regression, IEEE Trans. Signal Process., 58, 10, 5262-5276 (2010) · Zbl 1391.62133
[35] Mazumdar, E.V., Jordan, M.I., Sastry, S.S.: On finding local nash equilibria (and only local nash equilibria) in zero-sum games. (2019) ArXiv preprint arXiv:1901.00838
[36] Monteiro, RDC; Svaiter, BF, On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean, SIAM J. Optim., 20, 6, 2755-2787 (2010) · Zbl 1230.90200
[37] Monteiro, RDC; Svaiter, BF, Complexity of variants of Tseng’s modified F-B splitting and Korpelevich’s methods for hemivariational inequalities with applications to saddle-point and convex optimization problems, SIAM J. Optim., 21, 4, 1688-1720 (2011) · Zbl 1245.90155
[38] Nedic, A.; Ozdaglar, A., Subgradient methods for saddle-point problems, J. Optim. Theory Appl., 142, 1, 205-228 (2009) · Zbl 1175.90415
[39] Nemirovski, A., Prox-method with rate of convergence \({\cal{O} }(1/t)\) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems, SIAM J. Optim., 15, 1, 229-251 (2004) · Zbl 1106.90059
[40] Nesterov, Y., Introductory Lectures on Convex Optimization: A Basic Course (2003), Berlin: Springer Science and Business Media, Berlin · Zbl 1086.90045
[41] Nesterov, Y., Dual extrapolation and its applications to solving variational inequalities and related problems, Math. Program., 109, 2-3, 319-344 (2007) · Zbl 1167.90014
[42] Nesterov, Y., Gradient methods for minimizing composite functions, Math. Program., 140, 1, 125-161 (2013) · Zbl 1287.90067
[43] Nouiehed, M.; Sanjabi, M.; Huang, T.; Lee, JD, Solving a class of nonconvex min-max games using iterative first order methods, Adv. Neural. Inf. Process. Syst., 32, 14934-14942 (2019)
[44] Ouyang, Y.; Chen, Y.; Lan, G.; Pasiliao, E. Jr, An accelerated linearized alternating direction method of multipliers, SIAM J. Imag. Sci., 8, 1, 644-681 (2015) · Zbl 1321.90105
[45] Ouyang, Y.; Xu, Y., Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems, Math. Program. (2019) · Zbl 1458.90516 · doi:10.1007/s10107-019-01420-0
[46] Qian, Q.; Zhu, S.; Tang, J.; Jin, R.; Sun, B.; Li, H., Robust optimization over multiple domains, Proc. AAI Conf.Artif. Intell., 33, 1, 4739-4746 (2019)
[47] Rafique, H., Liu, M., Lin, Q., Yang, T.: Nonconvex min-max optimization: provable algorithms and applications in machine learning. (2018) ArXiv preprint arXiv:1810.02060, · Zbl 1502.90194
[48] Sanjabi, M.; Ba, J.; Razaviyayn, M.; Lee, JD, On the convergence and robustness of training gans with regularized optimal transport, Neural. Inf. Process. Syst., 31, 7091-7101 (2018)
[49] Thekumparampil, K.K., Jain, P., Netrapalli, P., Oh, S.: Efficient algorithms for smooth minimax optimization. In NeurIPS, pp. 12659-12670 (2019)
[50] Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization. SIAM J. Optim. 2(3), (2008). Available at http://www.mit.edu/ dimitrib/PTseng/papers/apgm.pdf
[51] Yang, J., Kiyavash, N., He, N.: Global convergence and variance-blueuced optimization for a class of nonconvex-nonconcave minimax problems. (2020) ArXiv preprint arXiv:2002.09621
[52] Yann, L.; Léon, B.; Yoshua, B.; Patrick, H., Gradient-based learning applied to document recognition, Proc. IEEE, 86, 11, 2278-2324 (1998)
[53] Grimmer, B., Lu, H., Worah, P., Mirrokni, V.: The landscape of nonconvex-nonconcave minimax optimization. (2020) ArXiv preprint arXiv:2006.08667 · Zbl 1522.90258
[54] Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for GANs do actually converge?. In International conference on machine learning, pp. 3481-3490, (2018)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.