×

Mini-batch stochastic subgradient for functional constrained optimization. (English) Zbl 1542.65069

Summary: In this paper, we consider finite sum composite optimization problems with many functional constraints. The objective function is expressed as a finite sum of two terms, one of which admits easy computation of (sub)gradients while the other is amenable to proximal evaluations. We assume a generalized bounded gradient condition on the objective which allows us to simultaneously tackle both smooth and nonsmooth problems. We also consider the cases of both with and without a quadratic functional growth property. Further, we assume that each constraint set is given as the level set of a convex but not necessarily a differentiable function. We reformulate the constrained finite sum problem into a stochastic optimization problem for which the stochastic subgradient projection method from [I. Necoara and N. K. Singh, “Stochastic subgradient projection methods for composite optimization with functional constraints”, J. Mach. Learn. Res. 23, 1–35 (2022)] specializes in a collection of mini-batch variants, with different mini-batch sizes for the objective function and functional constraints, respectively. More specifically, at each iteration, our algorithm takes a mini-batch stochastic proximal subgradient step aimed at minimizing the objective function and then a subsequent mini-batch subgradient projection step minimizing the feasibility violation. By specializing in different mini-batching strategies, we derive exact expressions for the stepsizes as a function of the mini-batch size and in some cases we also derive insightful stepsize-switching rules which describe when one should switch from a constant to a decreasing stepsize regime. We also prove sublinear convergence rates for the mini-batch subgradient projection algorithm which depend explicitly on the mini-batch sizes and on the properties of the objective function. Numerical results also show a better performance of our mini-batch scheme over its single-batch counterpart.

MSC:

65K05 Numerical mathematical programming methods
90C15 Stochastic programming
90C25 Convex programming

References:

[1] Bhattacharyya, C, Grate, LR, Jordan, MI, et al. Robust sparse hyperplane classifiers: application to uncertain molecular profiling data. J Comput Biol. 2004;11(6):1073-1089.
[2] Vapnik, V.Statistical learning theory. John Wiley; 1998. · Zbl 0935.62007
[3] Nedelcu, V, Necoara, I, Tran Dinh, Q.Computational complexity of inexact gradient augmented Lagrangian methods: application to constrained MPC. SIAM J Control Optim. 2014;52(5):3109-3134. · Zbl 1322.90066
[4] Necoara, I.General convergence analysis of stochastic first order methods for composite optimization. J Optim Theory Appl. 2021;189(1):66-95. · Zbl 1471.65055
[5] Tibshirani, R. The solution path of the generalized Lasso [PhD thesis]. Stanford Univ.; 2011. · Zbl 1234.62107
[6] Rockafellar, RT, Uryasev, SP.Optimization of conditional value-at-risk. J Risk. 2000;2(3):21-41.
[7] Jacob, L, Obozinski, G, Vert, JP. Group Lasso with overlap and graph Lasso. In: International conference on machine learning; 2009. p. 433-440.
[8] Yin, X, Büyüktahtakın, İ. A multi-stage stochastic programming approach to epidemic resource allocation with equity considerations. Health Care Manag Sci. 2021;24(3):597-622.
[9] Zafar, M, Valera, I, Gomez-Rodriguez, M, et al. Fairness constraints: a flexible approach for fair classification. J Mach Learn Res. 2019;20(1):2737-2778. · Zbl 1489.68263
[10] Rosasco, L, Villa, S, Vu, BC.Convergence of stochastic proximal gradient algorithm. Appl Math Optim. 2020;82(3):891-917. · Zbl 1465.90101
[11] Hardt, M, Recht, B, Singer, Y. Train faster, generalize better: stability of stochastic gradient descent. In: International conference on machine learning; 2016.
[12] Nemirovski, A, Yudin, DB.Problem complexity and method efficiency in optimization. New York: Wiley Interscience; 1983. · Zbl 0501.90062
[13] Robbins, H, Monro, S.A stochastic approximation method. Ann Math Stat. 1951;22(3):400-407. · Zbl 0054.05901
[14] Moulines, E, Bach, F. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: Advances in neural information processing systems conference; 2011.
[15] Nemirovski, A, Juditsky, A, Lan, G, et al. Robust stochastic approximation approach to stochastic programming. SIAM J Optim. 2009;19(4):1574-1609. · Zbl 1189.90109
[16] Patrascu, A, Necoara, I.Nonasymptotic convergence of stochastic proximal point algorithms for constrained convex optimization. J Mach Learn Res. 2018;18(198):1-42. · Zbl 1475.90048
[17] Asi, H, Chadha, K, Cheng, G, et al. Minibatch stochastic approximate proximal point methods. In: Advances in neural information processing systems conference; 2020.
[18] Duchi, J, Singer, Y.Efficient online and batch learning using forward backward splitting. J Mach Learn Res. 2009;10:2899-2934. · Zbl 1235.62151
[19] Nedich, A, Necoara, I.Random minibatch subgradient algorithms for convex problems with functional constraints. Appl Math Optim. 2019;80(3):801-833. · Zbl 1435.90107
[20] Peng, X, Li, L, Wang, F. Accelerating minibatch stochastic gradient descent using typicality sampling. In: IEEE Trans. Neural Networks Learn. Syst.; 2019.
[21] Gower, R, Nicolas, L, Xun, Q, et al. SGD: general analysis and improved rates. In: International conference on machine learning; 2019.
[22] Renegar, J, Zhou, S. A different perspective on the stochastic convex feasibility problem; 2021. arXiv preprint: 2108.12029v1.
[23] Polyak, BT, Juditsky, AB.Acceleration of stochastic approximation by averaging. SIAM J Control Optim. 1992;30(4):838-855. · Zbl 0762.62022
[24] Yang, T, Lin, Q.RSG: beating subgradient method without smoothness and strong convexity. J Mach Learn Res. 2018;19(6):1-33. · Zbl 1444.90092
[25] Gorbunov, E, Hanzely, F, Richtarik, P. A unified theory of SGD: variance reduction, sampling, quantization and coordinate descent. In: Proceedings of the 23rd international conference on artificial intelligence and statistics; vol. 108, 2020.
[26] Johnson, R, Zhang, T. Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in neural information processing systems; 2013. p. 315-323.
[27] Lin, H, Mairal, J, Harchaoui, Z. A universal catalyst for first-order optimization. In: Advances in neural information processing systems conference; 2015.
[28] Necoara, I, Singh, NK. Stochastic subgradient projection methods for composite optimization with functional constraints. J Mach Learn Res. 2022;23:1-35.
[29] Necoara, I, Nesterov, Y, Glineur, F.Linear convergence of first order methods for non-strongly convex optimization. Math Program. 2019;175(1):69-107. · Zbl 1412.90111
[30] Lewis, A, Pang, JS. Error bounds for convex inequality systems. In: Crouzeix J-P, Martinez-Legaz J-E, Volle M, editors. Generalized convexity, generalized monotonicity. Cambridge University Press; 1998. p. 75-110. · Zbl 0953.90048
[31] Nesterov, Y.Lectures on convex optimization, vol. 137. Springer Optimization and Its Applications; 2018. · Zbl 1427.90003
[32] Richtarik, P, Takac, M.On optimal probabilities in stochastic coordinate descent methods. Optim Lett. 2016;10(6):1233-1243. · Zbl 1353.90148
[33] Polyak, BT.Minimization of unsmooth functionals. USSR Comput Math Math Phys. 1969;9(3):14-29. · Zbl 0229.65056
[34] Gaines, BR, Kim, J, Zhou, H.Algorithms for fitting the constrained Lasso. J Comput Graph Stat. 2018;27(4):861-871. · Zbl 07498997
[35] Hu, Q, Zeng, P, Lin, L.The dual and degrees of freedom of linearly constrained generalized Lasso. Comput Stat Data Anal. 2015;86:13-26. · Zbl 1468.62084
[36] James, GM, Paulsonand, C, Rusmevichientong, P.Penalized and constrained optimization: an application to high-dimensional website advertising. SIAM J Optim. 2019;30(4):3230-3251.
[37] Grant, M, Boyd, S. CVX: matlab software for disciplined convex programming, version 2.0 beta; 2013. Available from: http://cvxr.com/cvx.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.