×

Stochastic multilevel composition optimization algorithms with level-independent convergence rates. (English) Zbl 1491.90155

Summary: In this paper, we study smooth stochastic multilevel composition optimization problems, where the objective function is a nested composition of \(T\) functions. We assume access to noisy evaluations of the functions and their gradients, through a stochastic first-order oracle. For solving this class of problems, we propose two algorithms using moving-average stochastic estimates, and analyze their convergence to an \(\epsilon\)-stationary point of the problem. We show that the first algorithm, which is a generalization of [S. Ghadimi et al., SIAM J. Optim. 30, No. 1, 960–979 (2020; Zbl 1441.90106)] to the \(T\) level case, can achieve a sample complexity of \(\mathcal{O}_T(1/\epsilon^6)\) by using minibatches of samples in each iteration, where \(\mathcal{O}_T\) hides constants that depend on \(T\). By modifying this algorithm using linearized stochastic estimates of the function values, we improve the sample complexity to \(\mathcal{O}_T(1/\epsilon^4)\). This modification not only removes the requirement of having a minibatch of samples in each iteration, but also makes the algorithm parameter-free and easy to implement. To the best of our knowledge, this is the first time that such an online algorithm designed for the (un)constrained multilevel setting obtains the same sample complexity of the smooth single-level setting, under standard assumptions (unbiasedness and boundedness of the second moments) on the stochastic first-order oracle.

MSC:

90C30 Nonlinear programming
90C26 Nonconvex programming, global optimization
90C15 Stochastic programming
49M37 Numerical methods based on nonlinear programming

Citations:

Zbl 1441.90106

Software:

SemiPar

References:

[1] A. Anastasiou, K. Balasubramanian, and M. Erdogdu, Normal approximation for stochastic gradient descent via non-asymptotic rates of martingale CLT, in Proceedings of the Conference on Learning Theory, 2019, pp. 115-137.
[2] Y. Arjevani, Y. Carmon, J. Duchi, D. Foster, N. Srebro, and B. Woodworth, Lower Bounds for Non-Convex Stochastic Optimization, preprint, arXiv:1912.02365, 2019.
[3] J. Blanchet, D. Goldfarb, G. Iyengar, F. Li, and C. Zhou, Unbiased Simulation for Optimizing Stochastic Function Compositions, preprint, arXiv:1711.07564, 2017.
[4] A. Bora, A. Jalal, E. Price, and A. G. Dimakis, Compressed sensing using generative models, in Proceedings of the 34th International Conference on Machine Learning, Vol. 70, JMLR, 2017, pp. 537-546.
[5] V. Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint, Texts Read. Math. 48, Springer, New York, 2009. · Zbl 1181.62119
[6] T. Chen, Y. Sun, and W. Yin, Solving stochastic compositional optimization is nearly as easy as solving stochastic optimization, IEEE Trans. Signal Process., 69 (2021), pp. 4937-4948. · Zbl 07591674
[7] W. Cong, R. Forsati, M. Kandemir, and M. Mahdavi, Minimal variance sampling with provable guarantees for fast training of graph neural networks, in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2020, pp. 1393-1403.
[8] D. Davis and D. Drusvyatskiy, Stochastic model-based minimization of weakly convex functions, SIAM J. Optim., 29 (2019), pp. 207-239. · Zbl 1415.65136
[9] D. Dentcheva, S. Penev, and A. Ruszczyński, Statistical estimation of composite risk functionals and risk optimization problems, Ann. Inst. Statist. Math., 69 (2017), pp. 737-760. · Zbl 1447.62119
[10] A. Dieuleveut, A. Durmus, and F. Bach, Bridging the gap between constant step size stochastic gradient descent and markov chains, Ann. Statist., 48 (2020), pp. 1348-1382. · Zbl 1454.62242
[11] Y. Drori and O. Shamir, The complexity of finding stationary points with stochastic gradient descent, in Proceedings of the 35th International Conference on Machine Learning, Vol. 119, 2019.
[12] D. Drusvyatskiy and A. Lewis, Error bounds, quadratic growth, and linear convergence of proximal methods, Math. Oper. Res., 43 (2018), pp. 693-1050. · Zbl 1440.90046
[13] J. Duchi and F. Ruan, Stochastic methods for composite and weakly convex optimization problems, SIAM J. Optim., 28 (2018), pp. 3229-3259. · Zbl 06989166
[14] Y. Ermoliev, Methods of Stochastic Programming, Nauka, Moscow, 1976. · Zbl 0484.90070
[15] Y. Ermoliev and V. Norkin, Sample average approximation method for compound stochastic optimization problems, SIAM J. Optim., 23 (2013), pp. 2231-2263. · Zbl 1295.90025
[16] C. Fang, C. J. Li, Z. Lin, and T. Zhang, Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator, in Advances in Neural Information Processing Systems, 2018, pp. 689-699.
[17] S. Ghadimi and G. Lan, Stochastic first-and zeroth-order methods for nonconvex stochastic programming, SIAM J. Optim., 23 (2013), pp. 2341-2368. · Zbl 1295.90026
[18] S. Ghadimi and G. Lan, Accelerated gradient methods for nonconvex nonlinear and stochastic programming, Math. Program., 156 (2016), pp. 59-99. · Zbl 1335.62121
[19] S. Ghadimi, G. Lan, and H. Zhang, Mini-batch stochastic approximation methods for constrained nonconvex stochastic programming, Math. Program., 155 (2016), pp. 267-305. · Zbl 1332.90196
[20] S. Ghadimi, A. Ruszczynski, and M. Wang, A single timescale stochastic approximation method for nested stochastic optimization, SIAM J. Optim., 30 (2020), pp. 960-979. · Zbl 1441.90106
[21] H. Kushner and G. Yin, Stochastic Approximation and Recursive Algorithms and Applications, Stoch. Model. Appl. Probab. 35, Springer, New York, 2003. · Zbl 1026.62084
[22] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Kluwer, Norwell, MA, 2004. · Zbl 1086.90045
[23] Y. Nesterov, Gradient methods for minimizing composite objective functions, Math, Program., 140 (2013), pp. 125-161. · Zbl 1287.90067
[24] G. Ongie, A. Jalal, C. M. R. Baraniuk, A. Dimakis, and R. Willett, Deep learning techniques for inverse problems in imaging, IEEE J. Selected Areas Information Theory (2020).
[25] B. Polyak and A. Juditsky, Acceleration of stochastic approximation by averaging, SIAM J. Control Optim., 30 (1992), pp. 838-855. · Zbl 0762.62022
[26] H. Robbins and S. Monro, A stochastic approximation method, Ann. Math. Statist., 22 (1951), pp. 400-407. · Zbl 0054.05901
[27] D. Ruppert, Efficient Estimations from a Slowly Convergent Robbins-Monro Process, Tech. report, Operations Research and Industrial Engineering, Cornell University, 1988.
[28] D. Ruppert, M. P. Wand, and R. J. Carroll, Semiparametric Regression, Camb. Ser. Stat. 12, Cambridge University Press, Cambridge, UK, 2003. · Zbl 1038.62042
[29] A. Ruszczyński, A linearization method for nonsmooth stochastic programming problems, Math. Oper. Res., 12 (1987), pp. 32-49. · Zbl 0624.90078
[30] A. Ruszczyński, Convergence of a stochastic subgradient method with averaging for nonsmooth nonconvex constrained optimization, Optim. Lett., 14 (2020), pp. 1615-1625. · Zbl 1459.90168
[31] A. Ruszczynski, A stochastic subgradient method for nonsmooth nonconvex multilevel composition optimization, SIAM J. Control Optim., 59 (2021), pp. 2301-2320. · Zbl 1470.90096
[32] A. Ruszczyński and A. Shapiro, Optimization of convex risk functions, Math. Oper. Res., 31 (2006), pp. 433-452. · Zbl 1278.90283
[33] A. Shapiro, D. Dentcheva, and A. Ruszczyński, Lectures on Stochastic Programming: Modeling and Theory, SIAM, Philadelphia, 2014. · Zbl 1302.90003
[34] M. Wang, E. Fang, and H. Liu, Stochastic compositional gradient descent: Algorithms for minimizing compositions of expected-value functions, Math. Program., 161 (2017), pp. 419-449. · Zbl 1356.90099
[35] M. Wang, J. Liu, and E. Fang, Accelerating stochastic composition optimization, in Advances in Neural Information Processing Systems, 2016. · Zbl 1441.62217
[36] S. Yang, M. Wang, and E. Fang, Multilevel stochastic gradient methods for nested composition optimization, SIAM J. Optim., 29 (2019), pp. 616-659. · Zbl 1414.90251
[37] L. Yu, K. Balasubramanian, S. Volgushev, and M. Erdogdu, An analysis of constant step size SGD in the non-convex regime: Asymptotic normality and bias, in Advances in Neural Information Processing Systems, 2021.
[38] J. Zhang and L. Xiao, Multilevel composite stochastic optimization via nested variance reduction, SIAM J. Optim., 31 (2021), pp. 1131-1157. · Zbl 1514.90174
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.