×

Choose your path wisely: gradient descent in a Bregman distance framework. (English) Zbl 1480.90197

Summary: We propose an extension of a special form of gradient descent-in the literature known as linearized Bregman iteration-to a larger class of nonconvex functions. We replace the classical (squared) two norm metric in the gradient descent setting with a generalized Bregman distance, based on a proper, convex, and lower semicontinuous function. The algorithm’s global convergence is proven for functions that satisfy the Kurdyka-Łojasiewicz property. Examples illustrate that features of different scale are being introduced throughout the iteration, transitioning from coarse to fine. This coarse-to-fine approach with respect to scale allows us to recover solutions of nonconvex optimization problems that are superior to those obtained with conventional gradient descent, or even projected and proximal gradient descent. The effectiveness of the linearized Bregman iteration in combination with early stopping is illustrated for the applications of parallel magnetic resonance imaging, blind deconvolution, as well as image classification with neural networks.

MSC:

90C26 Nonconvex programming, global optimization
65K05 Numerical mathematical programming methods
65K10 Numerical optimization and variational techniques
90C30 Nonlinear programming

Software:

Saga; BADMM; MNIST; iPiano

References:

[1] A. Aizenbud and D. Gourevitch, Schwartz functions on nash manifolds, Int. Math. Res. Not. IMRN, 2008 (2008). · Zbl 1161.58002
[2] H. Attouch and J. Bolte, On the convergence of the proximal algorithm for nonsmooth functions involving analytic features, Math. Program., 116 (2009), pp. 5-16. · Zbl 1165.90018
[3] H. Attouch, J. Bolte, P. Redont, and A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-łojasiewicz inequality, Math. Oper. Res., 35 (2010), pp. 438-457. · Zbl 1214.65036
[4] H. Attouch, J. Bolte, and B. F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods, Math. Program., 137 (2013), pp. 91-129. · Zbl 1260.49048
[5] M. Bachmayr and M. Burger, Iterative total variation methods for nonlinear inverse problems, Inverse Problems, 25 (2009), 26, https://doi.org/10.1088/0266-5611/25/10/105004. · Zbl 1188.49028
[6] H. H. Bauschke, J. Bolte, and M. Teboulle, A descent lemma beyond Lipschitz gradient continuity: First-order methods revisited and applications, Math. Oper. Res., 42 (2016), pp. 330-348. · Zbl 1364.90251
[7] H. H. Bauschke, J. M. Borwein, and P. L. Combettes, Essential smoothness, essential strict convexity, and Legendre functions in Banach spaces, Commun. Contemp. Math., 3 (2001), pp. 615-647. · Zbl 1032.49025
[8] H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, CMS Books Math. 408, Springer, New York, 2011. · Zbl 1218.47001
[9] S. Becker and J. Fadili, A quasi-newton proximal splitting method, in Advances in Neural Information Processing Systems, 2012, pp. 2618-2626.
[10] M. Benning, M. M. Betcke, M. J. Ehrhardt, and C.-B. Schönlieb, Gradient descent in a generalised Bregman distance framework, in Geometric Numerical Integration and Its Applications, G. R. W. Quispel, P. Bader, D. I. McLaren, and D. Tagami, eds., MI Lecture Notes 74, Institute of Mathematics for Industry, Kyushu University, Fukuoka, Japan, 2017, pp. 40-45, http://www.imi.kyushu-u.ac.jp/eng/files/imipublishattachment/file/math_58ec341a238fe.pdf.
[11] M. Benning, L. Gladden, D. Holland, C.-B. Schönlieb, and T. Valkonen, Phase reconstruction from velocity-encoded MRI measurements-a survey of sparsity-promoting variational approaches, J. Magnetic Resonance, 238 (2014), pp. 26-43.
[12] M. Benning, F. Knoll, C.-B. Schönlieb, and T. Valkonen, Preconditioned ADMM with nonlinear operator constraint, in Proceedings of the IFIP Conference on System Modeling and Optimization, Springer, New York, 2015, pp. 117-126.
[13] D. Bertsekas, On the Goldstein-Levitin-Polyak gradient projection method, IEEE Trans. Automat. Control, 21 (1976), pp. 174-184. · Zbl 0326.49025
[14] D. P. Bertsekas, Incremental gradient, subgradient, and proximal methods for convex optimization: A survey, in Optimization for Machine Learning, S. Sra, S. Nowozin, and S. Wright, eds., MIT Press, Cambridge, MA, 2011, pp. 85-120.
[15] J. Bolte, P. L. Combettes, and J.-C. Pesquet, Alternating proximal algorithm for blind image recovery, in Proceedings of the IEEE International Conference on Image Processing, 2010, pp. 1673-1676.
[16] J. Bolte, S. Sabach, and M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Math. Program., 146 (2014), pp. 459-494. · Zbl 1297.90125
[17] J. Bolte, S. Sabach, M. Teboulle, and Y. Vaisbourd, First Order Methods Beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems, preprint, arXiv:1706.06461, 2017. · Zbl 1402.90118
[18] S. Bonettini, I. Loris, F. Porta, and M. Prato, Variable metric inexact line-search based methods for nonsmooth optimization, SIAM J. Optim., 26 (2016), pp. 891-921, http://dx.doi.org/10.1137/15M1019325. · Zbl 1338.65157
[19] S. Bonettini, I. Loris, F. Porta, M. Prato, and S. Rebegoldi, On the convergence of a linesearch based proximal-gradient method for nonconvex optimization, Inverse Problems, 33 (2017), http://dx.doi.org/10.1088/1361-6420/aa5bfd. · Zbl 1373.65040
[20] L. M. Bregman, The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming, USSR Comput. Math. Math. Phys., 7 (1967), pp. 200-217. · Zbl 0186.23807
[21] M. Burger, G. Gilboa, S. Osher, and J. Xu, Nonlinear inverse scale space methods, Commun. Math. Sci., 4 (2006), pp. 179-212. · Zbl 1106.68117
[22] M. Burger, M. Möller, M. Benning, and S. Osher, An adaptive inverse scale space method for compressed sensing, Math. Comp., 82 (2013), pp. 269-299. · Zbl 1260.49053
[23] M. Burger, E. Resmerita, and L. He, Error estimation for Bregman iterations and inverse scale space methods in image restoration, Computing, 81 (2007), pp. 109-135. · Zbl 1147.68790
[24] J.-F. Cai, S. Osher, and Z. Shen, Convergence of the linearized Bregman iteration for \(\ell^1\)-norm minimization, Math. Comp., 78 (2009), pp. 2127-2136. · Zbl 1198.65103
[25] J.-F. Cai, S. Osher, and Z. Shen, Linearized Bregman iterations for compressed sensing, Math. Comp., 78 (2009), pp. 1515-1536. · Zbl 1198.65102
[26] P. Campisi and K. Egiazarian, Blind Image Deconvolution: Theory and Applications, CRC Press, Boca Raton, FL, 2016.
[27] Y. Censor and S. A. Zenios, Proximal minimization algorithm with d-functions, J. Optim. Theory Appl., 73 (1992), pp. 451-464. · Zbl 0794.90058
[28] A. Chambolle and T. Pock, A first-order primal-dual algorithm for convex problems with applications to imaging, J. Math. Imaging vision, 40 (2011), pp. 120-145. · Zbl 1255.68217
[29] A. Chambolle and T. Pock, An introduction to continuous optimization for imaging, Acta Numer., 25 (2016), pp. 161-319. · Zbl 1343.65064
[30] T. F. Chan and J. Shen, Image Processing and Analysis: Variational, PDE, Wavelet, and Stochastic Methods, SIAM, Philadelphia, 2005. · Zbl 1095.68127
[31] T. F. Chan and C.-K. Wong, Total variation blind deconvolution, IEEE Trans. Image Process., 7 (1998), pp. 370-375.
[32] E. Chouzenoux, J.-C. Pesquet, and A. Repetti, A block coordinate variable metric forward-backward algorithm, J. Global Optim., 66 (2016), pp. 457-485. · Zbl 1351.90128
[33] J. Darbon and S. Osher, Fast Discrete Optimization for Sparse Approximations and Deconvolutions, 2007.
[34] A. Defazio, F. Bach, and S. Lacoste-Julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, arXiv:1407.0202v2, 2014.
[35] D. Drusvyatskiy, A. D. Ioffe, and A. S. Lewis, Nonsmooth Optimization Using Taylor-Like Models: Error Bounds, Convergence, and Termination Criteria, preprint, arXiv:1610.03446, 2016. · Zbl 1459.65083
[36] J. Eckstein, Nonlinear proximal point algorithms using Bregman functions, with applications to convex programming, Math. Oper. Res., 18 (1993), pp. 202-226. · Zbl 0807.47036
[37] E. Esser, X. Zhang, and T. F. Chan, A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science, SIAM J. Imaging Sci., 3 (2010), pp. 1015-1046. · Zbl 1206.90117
[38] K. Frick and O. Scherzer, Regularization of ill-posed linear equations by the non-stationary augmented lagrangian method, J. Integral Equations Appl., 22 (2010), pp. 217-257. · Zbl 1203.90156
[39] D. Gabay, Chapter IX applications of the method of multipliers to variational inequalities, in Augmented Lagrangian Methods, Stud. Math. Appl. 15, Elsevier, Amsterdam, 1983, pp. 299-331.
[40] G. Garrigos, L. Rosasco, and S. Villa, Iterative Regularization via Dual Diagonal Descent, preprint, arXiv:1610.02170, 2016. · Zbl 1425.94013
[41] A. A. Goldstein, Convex programming in Hilbert space, Bull. Amer. Math. Soc., 70 (1964), pp. 709-710. · Zbl 0142.17101
[42] A. A. Goldstein, Constructive Real Analysis, Tech. report, Department of Mathematics, University of Washington, Seattle, 1967. · Zbl 0189.49703
[43] B. Huang, S. Ma, and D. Goldfarb, Accelerated linearized bregman method, J. Sci. Comput., 54 (2013), pp. 428-453. · Zbl 1271.65096
[44] R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Proceedings of NIPS, Vol. 1, 2013, pp. 315-323.
[45] B. Kaltenbacher, F. Schöpfer, and T. Schuster, Iterative methods for nonlinear ill-posed problems in banach spaces: Convergence and applications to parameter identification problems, Inverse Problems, 25 (2009), 065003. · Zbl 1176.65070
[46] K. C. Kiwiel, Proximal minimization methods with generalized Bregman functions, SIAM J. Control Optim., 35 (1997), pp. 1142-1168. · Zbl 0890.65061
[47] F. Knoll, K. Bredies, T. Pock, and R. Stollberger, MRI Raw Data: T2 Weighted TSE Scan of a Healthy Volunteer (4 Channel Head Coil), https://doi.org/10.5281/zenodo.800525, 2010.
[48] F. Knoll, C. Clason, K. Bredies, M. Uecker, and R. Stollberger, Parallel imaging with nonlinear reconstruction using variational penalties, Magnetic Resonance in Medicine, 67 (2012), pp. 34-41.
[49] D. Kundur and D. Hatzinakos, Blind image deconvolution, IEEE Signal Process. Mag., 13 (1996), pp. 43-64, https://doi.org/10.1109/79.489268.
[50] K. Kurdyka, On gradients of functions definable in o-minimal structures, Ann. Inst. Fourier, 48 (1998), pp. 769-784. · Zbl 0934.32009
[51] Y. LeCun, C. Cortes, and C. J. Burges, MNIST Handwritten Digit Database, AT&T Labs, 2 (2010), http://yann.lecun.com/exdb/mnist.
[52] G. Li and T. K. Pong, Global convergence of splitting methods for nonconvex composite optimization, SIAM J. Optim., 25 (2015), pp. 2434-2460. · Zbl 1330.90087
[53] J. Liang, J. Fadili, and G. Peyré, A multi-step inertial forward-backward splitting method for non-convex optimization, in Advances in Neural Information Processing Systems, 2016, pp. 4035-4043.
[54] P.-L. Lions and B. Mercier, Splitting algorithms for the sum of two nonlinear operators, SIAM J. Numer. Anal., 16 (1979), pp. 964-979. · Zbl 0426.65050
[55] S. Lojasiewicz, Une propriété topologique des sous-ensembles analytiques réels, Équations Dérivées Partielles, 117 (1963), pp. 87-89. · Zbl 0234.57007
[56] S. Matet, L. Rosasco, S. Villa, and B. L. Vu, Don’t Relax: Early Stopping for Convex Regularization, preprint, arXiv:1707.05422, 2017.
[57] M. Moeller, M. Benning, C. Schönlieb, and D. Cremers, Variational depth from focus reconstruction, IEEE Trans. Image Process., 24 (2015), pp. 5369-5378. · Zbl 1408.94486
[58] J.-J. Moreau, Décomposition orthogonale d’un espace hilbertien selon deux cônes mutuellement polaires, C. R. Acad. Sci. Paris, 225 (1962), pp. 238-240. · Zbl 0109.08105
[59] J.-J. Moreau, Proximité et dualité dans un espace hilbertien, Bull. Soc. Math. France, 93 (1965), pp. 273-299. · Zbl 0136.12101
[60] V. A. Morozov, Methods for Solving Incorrectly Posed Problems, Springer, New York, 2012.
[61] M. Nikolova and P. Tan, Alternating structure-adapted proximal gradient descent for nonconvex nonsmooth block-regularized problems, SIAM J. Optim., 29 (2019), pp. 2053-2078. · Zbl 1421.90142
[62] J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed., Springer, New York, 2006. · Zbl 1104.65059
[63] P. Ochs, Y. Chen, T. Brox, and T. Pock, iPiano: Inertial proximal algorithm for nonconvex optimization, SIAM J. Imaging Sci., 7 (2014), pp. 1388-1419. · Zbl 1296.90094
[64] P. Ochs, J. Fadili, and T. Brox, Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms, preprint, arXiv:1707.02278, 2017. · Zbl 1416.49015
[65] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin, An iterative regularization method for total variation-based image restoration, Multiscale Model. Simul., 4 (2005), pp. 460-489. · Zbl 1090.94003
[66] S. Osher, F. Ruan, J. Xiong, Y. Yao, and W. Yin, Sparse recovery via differential inclusions, Appl. Comput. Harmon. Anal., 41 (2016), pp. 436-469. · Zbl 1360.94090
[67] T. Pock, D. Cremers, H. Bischof, and A. Chambolle, An algorithm for minimizing the Mumford-Shah functional, in Proceedings of the 12th International Conference on Computer Vision, IEEE, 2009, pp. 1133-1140.
[68] T. Pock and S. Sabach, Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems, SIAM J. Imaging Sci., 9 (2016), pp. 1756-1787. · Zbl 1358.90109
[69] M. Prato, S. Bonettini, I. Loris, F. Porta, and S. Rebegoldi, On the constrained minimization of smooth Kurdyka-Ł ojasiewicz functions with the scaled gradient projection method, J. Phys. Conf. Ser., 756 (2016), 012001, https://doi.org/http://dx.doi.org/10.1088/1742-6596/756/1/012001.
[70] K. P. Pruessmann, M. Weiger, M. B. Scheidegger, and P. Boesiger, SENSE: Sensitivity encoding for fast MRI, Magnetic Resonance in Medicine, 42 (1999), pp. 952-62.
[71] S. Ramani and J. A. Fessler, Parallel MR image reconstruction using augmented lagrangian methods, IEEE Trans. Medical Imaging, 30 (2011), pp. 694-706.
[72] A. Repetti, M. Q. Pham, L. Duval, E. Chouzenoux, and J.-C. Pesquet, Euclid in a taxicab: Sparse blind deconvolution with smoothed \(ell_1-ell_2\) regularization, IEEE Signal Process. Lett., 22 (2014), pp. 539-543.
[73] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1970. · Zbl 0193.18401
[74] R. T. Rockafellar and R. J.-B. Wets, Variational Analysis, Grundlehren Math. Wiss. 317, Springer, New York, 2009.
[75] F. Schöpfer, A. K. Louis, and T. Schuster, Nonlinear iterative methods for linear ill-posed problems in Banach spaces, Inverse Problems, 22 (2006), 311. · Zbl 1088.65052
[76] M. Teboulle, Entropic proximal mappings with applications to nonlinear programming, Math. Oper. Res., 17 (1992), pp. 670-690. · Zbl 0766.90071
[77] M. Uecker, T. Hohage, K. T. Block, and J. Frahm, Image reconstruction by regularized nonlinear inversion-joint estimation of coil sensitivities and image content, Magnetic Resonance in Medicine, 60 (2008), pp. 674-682.
[78] T. Valkonen, A primal-dual hybrid gradient method for nonlinear operators with applications to MRI, Inverse Problems, 30 (2014), 055012. · Zbl 1310.47081
[79] H. Wang and A. Banerjee, Bregman alternating direction method of multipliers, in Advances in Neural Information Processing Systems, 2014, pp. 2816-2824.
[80] Y. Xu and W. Yin, A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion, SIAM J. Imaging Sciences, 6 (2013), pp. 1758-1789. · Zbl 1280.49042
[81] Y. Xu and W. Yin, A globally convergent algorithm for nonconvex optimization based on block coordinate update, J. Sci. Comput., (2017), pp. 1-35. · Zbl 1378.65126
[82] W. Yin, Analysis and generalizations of the linearized Bregman method, SIAM J. Imaging Sci., 3 (2010), pp. 856-877. · Zbl 1211.68491
[83] W. Yin, S. Osher, D. Goldfarb, and J. Darbon, Bregman iterative algorithms for \(\ell_1\)-minimization with applications to compressed sensing, SIAM J. Imaging Sci., 1 (2008), pp. 143-168. · Zbl 1203.90153
[84] M. Zhu and T. Chan, An Efficient Primal-Dual Hybrid Gradient Algorithm for Total Variation Image Restoration, UCLA CAM Report 34, 2008.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.