Document Zbl 1305.90291

Block splitting for distributed optimization. (English) Zbl 1305.90291

Math. Program. Comput. 6, No. 1, 77-102 (2014).

Summary: This paper describes a general purpose method for solving convex optimization problems in a distributed computing environment. In particular, if the problem data includes a large linear operator or matrix $A$, the method allows for handling each sub-block of $A$ on a separate machine. The approach works as follows. First, we define a canonical problem form called graph form, in which we have two sets of variables related by a linear operator $A$, such that the objective function is separable across these two sets of variables. Many types of problems are easily expressed in graph form, including cone programs and a wide variety of regularized loss minimization problems from statistics, like logistic regression, the support vector machine, and the lasso. Next, we describe graph projection splitting, a form of Douglas-Rachford splitting or the alternating direction method of multipliers, to solve graph form problems serially. Finally, we derive a distributed block splitting algorithm based on graph projection splitting. In a statistical or machine learning context, this allows for training models exactly with a huge number of both training examples and features, such that each processor handles only a subset of both. To the best of our knowledge, this is the only general purpose method with this property. We present several numerical experiments in both the serial and distributed settings.

Cited in 16 Documents

MSC:

90C06	Large-scale problems in mathematical programming
90C22	Semidefinite programming
90C25	Convex programming
90C30	Nonlinear programming

Keywords:

distributed optimization; alternating direction method of multipliers; operator splitting; proximal operators; cone programming; machine learning

Software:

CVX; POGS; CSparse; AMD; SeDuMi; SDPT3; LDL; ATLAS

Cite Review PDF

Full Text: DOI

References:

[1]	Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011) · Zbl 1229.90122 · doi:10.1561/2200000016
[2]	Luenberger, D.G.: Optimization by Vector Space Methods. Wiley, New York (1969) · Zbl 0176.12701
[3]	Nesterov, Y., Nemirovskii, A.: Interior-Point Polynomial Methods in Convex Programming. Society for Industrial and Applied Mathematics (1994) · Zbl 0824.90112
[4]	Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications. Society for Industrial and Applied Mathematics (2001) · Zbl 0986.90032
[5]	Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004) · Zbl 1058.90049
[6]	Wen, Z., Goldfarb, D., Yin, W.: Alternating direction augmented Lagrangian methods for semidefinite programming. Math. Program. Comput. 2(3), 203–230 (2010) · Zbl 1206.90088 · doi:10.1007/s12532-010-0017-1
[7]	Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996) · Zbl 0850.62538
[8]	Group, I.M.R.T.C.W.: Intensity-modulated radiotherapy: Current status and issues of interest. Int. J. Radiat. Oncol. Biol. Phys. 51(4), 880–914 (2001)
[9]	Webb, S.: Intensity-Modulated Radiation Therapy. Taylor & Francis (2001)
[10]	Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace Hilbertien. Rep. Paris Acad. Sci. Ser. A 255, 2897–2899 (1962) · Zbl 0118.10502
[11]	Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 1–108 (2013). To appear
[12]	Eckstein, J., Ferris, M.C.: Operator-splitting methods for monotone affine variational inequalities, with a parallel application to optimal control. INFORMS J. Comput. 10(2), 218–235 (1998) · Zbl 1034.90531 · doi:10.1287/ijoc.10.2.218
[13]	Yang, J., Zhang, Y.: Alternating direction algorithms for $$$\backslash$backslash $\backslash$text{ ell }$\backslash$_1$$ $\backslash$ ell _ 1 -problems in compressive sensing. SIAM J. Sci. Comput. 33(1), 250–278 (2011) · Zbl 1256.65060 · doi:10.1137/090777761
[14]	Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Statistical Methodology) 68(1), 49–67 (2006) · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
[15]	Ohlsson, H., Ljung, L., Boyd, S.: Segmentation of ARX-models using sum-of-norms regularization. Automatica 46(6), 1107–1111 (2010) · Zbl 1191.93139 · doi:10.1016/j.automatica.2010.03.013
[16]	Agarwal, A., Chapelle, O., Dudik, M., Langford, J.: A reliable effective terascale linear learning, system. arXiv:1110.4198 (2011)
[17]	Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming (2008). http://cvxr.com/cvx
[18]	Sturm, J.: Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw. 11(1–4), 625–653 (1999) · Zbl 0973.90526 · doi:10.1080/10556789908805766
[19]	Toh, K.C., Todd, M.J., Tütüncü, R.H.: SDPT3: a MATLAB software package for semidefinite programming, version 1.3. Optim. Methods Softw. 11(1–4), 545–581 (1999) · Zbl 0997.90060 · doi:10.1080/10556789908805762
[20]	CVX Research, I.: CVX: Matlab software for disciplined convex programming, version 2.0 beta. http://cvxr.com/cvx/examples (2012). Example library
[21]	Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing (CDROM), pp. 1–27 (1998)
[22]	Vanderbei, R.J.: Symmetric quasi-definite matrices. SIAM J. Optim. 5(1), 100–113 (1995) · Zbl 0822.65017 · doi:10.1137/0805005
[23]	Saunders, M.A.: Cholesky-based methods for sparse least squares: the benefits of regularization. In: Adams, L., Nazareth, J.L. (eds.) Linear and Nonlinear Conjugate Gradient-Related Methods, pp. 92–100. SIAM, Philadelphia (1996) · Zbl 0865.65022
[24]	Davis, T.A.: Algorithm 8xx: a concise sparse Cholesky factorization package. ACM Trans. Math. Softw. 31(4), 587–591 (2005) · Zbl 1136.65311 · doi:10.1145/1114268.1114277
[25]	Amestoy, P., Davis, T.A., Duff, I.S.: An approximate minimum degree ordering algorithm. SIAM J. Matrix Anal. Appl. 17(4), 886–905 (1996) · Zbl 0861.65021 · doi:10.1137/S0895479894278952
[26]	Amestoy, P., Davis, T.A., Duff, I.S.: Algorithm 837: AMD, an approximate minimum degree ordering algorithm. ACM Trans. Math. Softw. 30(3), 381–388 (2004) · Zbl 1070.65534 · doi:10.1145/1024074.1024081
[27]	Davis, T.A.: Direct Methods for Sparse Linear Systems. SIAM, Philadelphia (2006) · Zbl 1119.65021
[28]	Benzi, M., Meyer, C.D., Tuma, M.: A sparse approximate inverse preconditioner for the conjugate gradient method. SIAM J. Sci. Comput. 17(5), 1135–1149 (1996) · Zbl 0856.65019 · doi:10.1137/S1064827594271421
[29]	Benzi, M.: Preconditioning techniques for large linear systems: a survey. J. Comput. Phys. 182(2), 418–477 (2002) · Zbl 1015.65018 · doi:10.1006/jcph.2002.7176
[30]	Golub, G.H., Van Loan, C.: Matrix Computations, vol. 3. Johns Hopkins University Press, Baltimore (1996) · Zbl 0865.65009
[31]	Wilkinson, J.H.: Rounding Errors in Algebraic Processes. Dover (1963) · Zbl 1041.65502
[32]	Moler, C.B.: Iterative refinement in floating point. J. ACM 14(2), 316–321 (1967) · Zbl 0161.35501 · doi:10.1145/321386.321394

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.