×

A framework of convergence analysis of mini-batch stochastic projected gradient methods. (English) Zbl 1524.62385

Summary: In this paper, we establish a unified framework to study the almost sure global convergence and the expected convergence rates of a class of mini-batch stochastic (projected) gradient (SG) methods, including two popular types of SG: stepsize diminished SG and batch size increased SG. We also show that the standard variance uniformly bounded assumption, which is frequently used in the literature to investigate the convergence of SG, is actually not required when the gradient of the objective function is Lipschitz continuous. Finally, we show that our framework can also be used for analyzing the convergence of a mini-batch stochastic extragradient method for stochastic variational inequality.

MSC:

62L20 Stochastic approximation
90C25 Convex programming
90C15 Stochastic programming

Software:

Saga
Full Text: DOI

References:

[1] Shalev-Shwartz, S.; Ben-David, S., Understanding Machine Learning: From Theory to Algorithms (2014), New York: Cambridge University Press, New York · Zbl 1305.68005 · doi:10.1017/CBO9781107298019
[2] Robbins, H.; Monro, S., A stochastic approximation method, Ann. Math. Stat., 22, 400-407 (1951) · Zbl 0054.05901 · doi:10.1214/aoms/1177729586
[3] Bottou, L.; Curtis, FE; Nocedal, J., Optimization methods for large-scale machine learning, SIAM Rev., 60, 2, 223-311 (2018) · Zbl 1397.65085 · doi:10.1137/16M1080173
[4] Kushner, H.J., Yin, G.G.: Stochastic approximation and recursive algorithms and applications. Springer, New York (2003) · Zbl 1026.62084
[5] Nemirovski, A.; Juditsky, A.; Lan, G.; Shapiro, A., Robust stochastic approximation approach to stochastic programming, SIAM J. Optim., 19, 4, 1574-1609 (2009) · Zbl 1189.90109 · doi:10.1137/070704277
[6] Friedlander, MP; Schmidt, M., Hybrid deterministic-stochastic methods for data fitting, SIAM J. Sci. Comput., 34, 3, A1380-A1405 (2012) · Zbl 1262.90090 · doi:10.1137/110830629
[7] Byrd, RH; Chin, GM; Nocedal, J.; Wu, Y., Sample size selection in optimization methods for machine learning, Math. Program., 134, 1, 127-155 (2012) · Zbl 1252.49044 · doi:10.1007/s10107-012-0572-5
[8] Bubeck, S., Convex optimization: algorithms and complexity, Found. Trends Mach. Learn., 8, 3-4, 231-357 (2015) · Zbl 1365.90196 · doi:10.1561/2200000050
[9] Ghadimi, S.; Lan, G., Stochastic first- and zeroth-order methods for nonconvex stochastic programming, SIAM J. Optim., 23, 4, 2341-2368 (2013) · Zbl 1295.90026 · doi:10.1137/120880811
[10] Nedić, A.; Lee, S., On stochastic subgradient mirror-descent algorithm with weighted averaging, SIAM J. Optim., 24, 1, 84-107 (2014) · Zbl 1297.90119 · doi:10.1137/120894464
[11] Ghadimi, S.; Lan, G.; Zhang, H., Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization, Math. Program., 155, 1-2, 267-305 (2016) · Zbl 1332.90196 · doi:10.1007/s10107-014-0846-1
[12] Iusem, AN; Jofré, A.; Oliveira, RI; Thompson, P., Extragradient method with variance reduction for stochastic variational inequalities, SIAM J. Optim., 27, 2, 686-724 (2017) · Zbl 1365.65179 · doi:10.1137/15M1031953
[13] Jofré, A., Thompson, P.: On variance reduction for stochastic smooth convex optimization with multiplicative noise (2017). arXiv:1705.02969
[14] Nguyen, L., Nguyen, P.H., van Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and hogwild! Convergence without the bounded gradients assumption. In: Proceedings of the 35th international conference on machine learning, pp. 3750-3758 (2018)
[15] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 315-323 (2013)
[16] Schmidt, M.; Le Roux, N.; Bach, F., Minimizing finite sums with the stochastic average gradient, Math. Program., 162, 1-2, 83-112 (2017) · Zbl 1358.90073 · doi:10.1007/s10107-016-1030-6
[17] Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646-1654 (2014)
[18] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: SARAH: a novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th international conference on machine learning, pp. 2613-2621 (2017)
[19] Korpelevich, GM, The extragradient method for finding saddle points and other problems, Ekon. Mat. Metody, 12, 747-756 (1976) · Zbl 0342.90044
[20] Facchinei, F., Pang, J.S.: Finite-dimensional variational inequalities and complementarity problems, vol. I. Springer Series in Operations Research. Springer, New York (2003) · Zbl 1062.90001
[21] Robbins, H., Siegmund, D.: A convergence theorem for non negative almost supermartingales and some applications. In: Proceedings of a Symposium Held at the Center for Tomorrow, the Ohio State University, pp. 233-257 (1971) · Zbl 0286.60025
[22] Billingsley, P.: Probability and measure, 3rd edn. Wiley Series in Probability and Mathematical Statistics. Wiley, New York (1995) · Zbl 0822.60002
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.