Document Zbl 1401.62129

A sparsity preserving stochastic gradient methods for sparse regression. (English) Zbl 1401.62129

Comput. Optim. Appl. 58, No. 2, 455-482 (2014).

Summary: We propose a new stochastic first-order algorithm for solving sparse regression problems. In each iteration, our algorithm utilizes a stochastic oracle of the subgradient of the objective function. Our algorithm is based on a stochastic version of the estimate sequence technique introduced by Y. Nesterov [Introductory lectures on convex optimization. A basic course. Boston: Kluwer Academic Publishers (2004; Zbl 1086.90045)]. The convergence rate of our algorithm depends continuously on the noise level of the gradient. In particular, in the limiting case of noiseless gradient, the convergence rate of our algorithm is the same as that of optimal deterministic gradient algorithms. We also establish some large deviation properties of our algorithm. Unlike existing stochastic gradient methods with optimal convergence rates, our algorithm has the advantage of readily enforcing sparsity at all iterations, which is a critical property for applications of sparse regressions.

Cited in 6 Documents

MSC:

62L20	Stochastic approximation
90C15	Stochastic programming
90C25	Convex programming

Keywords:

sparse regression; stochastic gradient; first-order methods

Citations:

Zbl 1086.90045

Cite Review PDF

Full Text: DOI

References:

[1]	Beck, A., Teboulle, M.: A fast iterative shrinkage-threshold algorithm for linear inverse problems. SIAM J. Image Sci. 2(1), 183-202 (2009) · Zbl 1175.94009 · doi:10.1137/080716542
[2]	Chung, K.L.: On a stochastic approximation method. Ann. Math. Stat. 25, 463-483 (1954) · Zbl 0059.13203 · doi:10.1214/aoms/1177728716
[3]	Cotter, A.; Shamir, O.; Srebro, N.; Sridharan, K., Better mini-batch algorithms via accelerated gradient methods (2011)
[4]	Dekel, O., Gilad-Bachrach, R., Shamir, O., Xiao, L.: Optimal distributed online prediction using mini-batches. J. Mach. Learn. Res. 13, 165-202 (2012) · Zbl 1283.68404
[5]	Duchi, J., Singer, Y.: Efficient online and batch learning using forward-backward splitting. J. Mach. Learn. Res. 10, 2873-2898 (2009) · Zbl 1235.68157
[6]	Duchi, J.C., Bartlett, P.L., Wainwright, M.J.: Randomized smoothing for stochastic optimization. SIAM J. Optim. 22(2), 674-701 (2012) · Zbl 1267.65063 · doi:10.1137/110831659
[7]	Ermoliev, Y.: Stochastic quasigradient methods and their application to system optimization. Stochastics 9, 1-36 (1983) · Zbl 0512.90079 · doi:10.1080/17442508308833246
[8]	Gaivoronski, A.A.: Nonstationary stochastic programming problems. Kybernetika 4, 89-92 (1978) · Zbl 0394.90072
[9]	Hu, C.; Kwok, J. T.; Pan, W., Accelerated gradient methods for stochastic optimization and online learning (2009)
[10]	Kiefer, J., Wolfowitz, J.: Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23, 462-466 (1952) · Zbl 0049.36601 · doi:10.1214/aoms/1177729392
[11]	Kushner, H.J., Yin, G.G.: Stochastic Approximation Algorithms and Applications. Springer, New York (2003) · Zbl 1026.62084
[12]	Lan, G.: An optimal method for stochastic composite optimization. Math. Program. 133(1), 365-397 (2012) · Zbl 1273.90136 · doi:10.1007/s10107-010-0434-y
[13]	Lan, G., Ghadimi, S.: Optimal stochastic approximation algorithm for strongly convex stochastic composite optimization, I: a generic algorithmic framework. SIAM J. Optim. 22(4), 1469-1492 (2012) · Zbl 1301.62077 · doi:10.1137/110848864
[14]	Lan, G., Ghadimi, S.: Optimal stochastic approximation algorithm for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM J. Optim. 23(4), 2061-2069 (2013) · Zbl 1293.62167 · doi:10.1137/110848876
[15]	Langford, J., Li, L., Zhang, T.: Sparse online learning via truncated gradient. J. Mach. Learn. Res. 10, 2873-2908 (2009)
[16]	Lee, S., Wright, S.J.: Manifold identification in dual averaging methods for regularized stochastic online learning. J. Mach. Learn. Res. 13, 1705-1744 (2012) · Zbl 1432.68392
[17]	Liu, J.; Ji, S.; Ye, J., Multi-task feature learning via efficient ℓ1/ℓ2 norm minimization (2009)
[18]	Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574-1609 (2009) · Zbl 1189.90109 · doi:10.1137/070704277
[19]	Nemirovski, A., Yudin, D.: On Cezari’s convergence of the steepest descent method for approximating saddle point of convex-concave functions. Soviet Mathematics Doklady 19 (1978) · Zbl 1293.62167
[20]	Nemirovski, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983) · Zbl 0501.90062
[21]	Nesterov, Y.E.: Gradient methods for minimizing composite objective function. Technical report, CORE (2007) · Zbl 0049.36601
[22]	Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Amsterdam (2003) · Zbl 1086.90045
[23]	Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127-152 (2005) · Zbl 1079.90102 · doi:10.1007/s10107-004-0552-5
[24]	Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120, 221-259 (2009) · Zbl 1191.90038 · doi:10.1007/s10107-007-0149-x
[25]	Pflug, G.C.: Optimization of Stochastic Models, the Interface Between Simulation and Optimization. Kluwer, Boston (1996) · Zbl 0909.90220
[26]	Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30, 838-855 (1992) · Zbl 0762.62022 · doi:10.1137/0330046
[27]	Polyak, B.T.: New stochastic approximation type procedures. Automat. Telemekh. 7, 98-107 (1990)
[28]	Pong, T.K., Tseng, P., Ji, S., Ye, J.: Trace norm regularization: reformulations, algorithms, and multi-task learning. SIAM J. Optim. 20(6), 3465-3489 (2010) · Zbl 1211.90129 · doi:10.1137/090763184
[29]	Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400-407 (1951) · Zbl 0054.05901 · doi:10.1214/aoms/1177729586
[30]	Ruszczynski, A., Syski, W.: A method of aggregate stochastic subgradients with on-line stepsize rules for convex stochastic programming problems. Math. Program. Stud. 28, 113-131 (1986) · Zbl 0597.90064 · doi:10.1007/BFb0121128
[31]	Sacks, J.: Asymptotic distribution of stochastic approximation. Ann. Math. Stat. 29, 373-409 (1958) · Zbl 0229.62010 · doi:10.1214/aoms/1177706619
[32]	Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267-288 (1996) · Zbl 0850.62538
[33]	Toh, K.-C., Yun, S.: An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Pac. J. Optim. 6, 615-640 (2010) · Zbl 1205.90218
[34]	Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization. Technical report, University of Washington (2008) · Zbl 1141.62030
[35]	Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 11, 2543-2596 (2010) · Zbl 1242.62011
[36]	Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68, 49-67 (2006) · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.