×

Maximum likelihood estimation of regularization parameters in high-dimensional inverse problems: an empirical Bayesian approach. II: Theoretical analysis. (English) Zbl 1457.94011

Summary: This paper presents a detailed theoretical analysis of the three stochastic approximation proximal gradient algorithms proposed in our companion paper [ibid. 13, No. 4, 1945–1989 (2020; Zbl 1457.94014)] to set regularization parameters by marginal maximum likelihood estimation. We prove the convergence of a more general stochastic approximation scheme that includes the three algorithms of [loc. cit.] as special cases. This includes asymptotic and nonasymptotic convergence results with natural and easily verifiable conditions, as well as explicit bounds on the convergence rates. Importantly, the theory is also general in that it can be applied to other intractable optimization problems. A main novelty of the work is that the stochastic gradient estimates of our scheme are constructed from inexact proximal Markov chain Monte Carlo samplers. This allows the use of samplers that scale efficiently to large problems and for which we have precise theoretical guarantees.

MSC:

94A08 Image processing (compression, reconstruction, etc.) in information and communication theory
68U10 Computing methodologies for image processing
62F15 Bayesian inference
65C05 Monte Carlo methods

Citations:

Zbl 1457.94014

Software:

PhaseLift; BlockPR

References:

[1] Y. F. Atchadé, G. Fort, and E. Moulines, On perturbed proximal gradient algorithms, J. Mach. Learn. Res., 18 (2017), pp. 310-342. · Zbl 1433.90199
[2] F. R. Bach and E. Moulines, Non-asymptotic analysis of stochastic approximation algorithms for machine learning, in Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, Granada, Spain, Curran Associates, Red Hook, NY, 2011, pp. 451-459.
[3] D. Bakry, F. Barthe, P. Cattiaux, and A. Guillin, A simple proof of the Poincaré inequality for a large class of probability measures including the log-concave case, Electron. Commun. Probab., 13 (2008), pp. 60-66, https://doi.org/10.1214/ECP.v13-1352. · Zbl 1186.26011
[4] D. Bakry, I. Gentil, and M. Ledoux, Analysis and Geometry of Markov Diffusion Operators, Grundlehren Math. Wiss. 348, Springer, Cham, Switzerland, 2014, https://doi.org/10.1007/978-3-319-00227-9. · Zbl 1376.60002
[5] H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd ed., CMS Books Math./Ouvrages Math. SMC, Springer, Cham, Switzerland, 2017, https://doi.org/10.1007/978-3-319-48311-5. · Zbl 1359.26003
[6] M. Benaim, A dynamical system approach to stochastic approximations, SIAM J. Control Optim., 34 (1996), pp. 437-472, https://doi.org/10.1137/S0363012993253534. · Zbl 0841.62072
[7] A. Benveniste, M. Métivier, and P. Priouret, Adaptive Algorithms and Stochastic Approximations, Appl. Math. (New York) 22, Springer, Berlin, 1990, https://doi.org/10.1007/978-3-642-75894-2. · Zbl 0752.93073
[8] S. Berisha, J. G. Nagy, and R. J. Plemmons, Deblurring and sparse unmixing of hyperspectral images using multiple point spread functions, SIAM J. Sci. Comput., 37 (2015), pp. S389-S406. · Zbl 1325.94029
[9] J. M. Bioucas-Dias, A. Plaza, N. Dobigeon, M. Parente, Q. Du, P. Gader, and J. Chanussot, Hyperspectral unmixing overview: Geometrical, statistical, and sparse regression-based approaches, IEEE J. Sel. Topics Appl. Earth Observ. Remote Sensing, 5 (2012), pp. 354-379.
[10] E. J. Candès, Y. C. Eldar, T. Strohmer, and V. Voroninski, Phase retrieval via matrix completion, SIAM Rev., 57 (2015), pp. 225-251. · Zbl 1344.49057
[11] A. Chambolle and T. Pock, An introduction to continuous optimization for imaging, Acta Numer., 25 (2016), pp. 161-319. · Zbl 1343.65064
[12] E. Chouzenoux, A. Jezierska, J.-C. Pesquet, and H. Talbot, A convex approach for image restoration with exact Poisson-Gaussian likelihood, SIAM J. Imaging Sci., 8 (2015), pp. 2662-2682. · Zbl 1343.94004
[13] J. Chung and L. Nguyen, Motion estimation and correction in photoacoustic tomographic reconstruction, SIAM J. Imaging Sci., 10 (2017), pp. 216-242. · Zbl 1364.53070
[14] A. S. Dalalyan, Theoretical guarantees for approximate sampling from smooth and log-concave densities, J. Roy. Statist. Soc. Ser. B, 79 (2017), pp. 651-676. · Zbl 1411.62030
[15] A. S. Dalalyan and A. Karagulyan, User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient, Stochastic Process. Appl., 129 (2019), pp. 5278-5311. · Zbl 1428.62316
[16] V. De Bortoli and A. Durmus, Convergence of Diffusions and Their Discretizations: From Continuous to Discrete Processes and Back, preprint, https://arxiv.org/abs/1904.09808 (2019).
[17] V. De Bortoli, A. Durmus, M. Pereyra, and A. F. Vidal, Efficient Stochastic Optimisation by Unadjusted Langevin Monte Carlo. Application to Maximum Marginal Likelihood and Empirical Bayesian Estimation, https://arxiv.org/abs/1906.12281 (2019).
[18] D. L. Donoho, Compressed sensing, IEEE Trans. Inform. Theory, 52 (2006), pp. 1289-1306. · Zbl 1288.94016
[19] P. Dupuis and R. S. Ellis, A Weak Convergence Approach to the Theory of Large Deviations, Wiley Series in Probability and Statistics, Wiley, New York, 1997, https://doi.org/10.1002/9781118165904. · Zbl 0904.60001
[20] A. Durmus, S. Majewski, and B. Miasojedow, Analysis of Langevin Monte Carlo via convex optimization, J. Mach. Learn. Res., 20 (2019), pp. 1-46. · Zbl 1491.65009
[21] A. Durmus and E. Moulines, High-dimensional Bayesian inference via the unadjusted Langevin algorithm, Bernoulli, 25 (2019), pp. 2854-2882. · Zbl 1428.62111
[22] A. Durmus and E. Moulines, Nonasymptotic convergence analysis for the unadjusted Langevin algorithm, Ann. Appl. Probab., 27 (2017), pp. 1551-1587. · Zbl 1377.65007
[23] A. Durmus, É. Moulines, and M. Pereyra, Efficient Bayesian computation by proximal Markov chain Monte Carlo: When Langevin meets Moreau, SIAM J. Imaging Sci., 11 (2018), pp. 473-506. · Zbl 1401.65016
[24] B. Galerne and A. Leclaire, Texture inpainting using efficient Gaussian conditional simulation, SIAM J. Imaging Sci., 10 (2017), pp. 1446-1474. · Zbl 06830225
[25] N. Ikeda and S. Watanabe, Stochastic Differential Equations and Diffusion Processes, 2nd ed., North-Holland Math. Lib. 24, North-Holland, Amsterdam, 1989. · Zbl 0684.60040
[26] M. A. Iwen, A. Viswanathan, and Y. Wang, Fast phase retrieval from local correlation measurements, SIAM J. Imaging Sci., 9 (2016), pp. 1655-1688. · Zbl 1352.49035
[27] J. Kaipio and E. Somersalo, Statistical and Computational Inverse Problems, Appl. Math. Sci. 160, Springer, 2006. · Zbl 1068.65022
[28] M. Kech and F. Krahmer, Optimal injectivity conditions for bilinear inverse problems with applications to identifiability of deconvolution problems, SIAM J. Appl. Algebra Geom., 1 (2017), pp. 20-37. · Zbl 1365.15021
[29] J. Kiefer and J. Wolfowitz, Stochastic estimation of the maximum of a regression function, Ann. of Math. Statist., 23 (1952), pp. 462-466. · Zbl 0049.36601
[30] S. Kullback, Information Theory and Statistics, Wiley, New York, 1959. · Zbl 0088.10406
[31] S. Li, X. Kang, L. Fang, J. Hu, and H. Yin, Pixel-level image fusion: A survey of the state of the art, Inform. Fusion, 33 (2017), pp. 100-112.
[32] R. S. Liptser and A. N. Shiryaev, Statistics of Random Processes. II, Appl. Math. (New York) 6, Springer, Berlin, 2001. · Zbl 1008.62072
[33] M. Métivier and P. Priouret, Applications of a Kushner and Clark lemma to general classes of stochastic algorithms, IEEE Trans. Inform. Theory, 30 (1984), pp. 140-151, https://doi.org/10.1109/TIT.1984.1056894. · Zbl 0546.62056
[34] M. Métivier and P. Priouret, Théorèmes de convergence presque sure pour une classe d’algorithmes stochastiques à pas décroissant, Probab. Theory Related Fields, 74 (1987), pp. 403-428, https://doi.org/10.1007/BF00699098. · Zbl 0588.62153
[35] V. I. Morgenshtern and E. J. Candès, Super-resolution of positive sources: The discrete setup, SIAM J. Imaging Sci., 9 (2016), pp. 412-444. · Zbl 1352.49041
[36] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Appl. Optim. 87, Springer, 2013. · Zbl 1086.90045
[37] G. Pólya and G. Szegö, Problems and Theorems in Analysis. I, Classics Math., Springer, Berlin, 1998, https://doi.org/10.1007/978-3-642-61983-0. · Zbl 1053.00002
[38] B. T. Polyak and A. B. Juditsky, Acceleration of stochastic approximation by averaging, SIAM J. Control Optimi., 30 (1992), pp. 838-855, https://doi.org/10.1137/0330046. · Zbl 0762.62022
[39] A. Rakhlin, O. Shamir, and K. Sridharan, Making gradient descent optimal for strongly convex stochastic optimization, in Proceedings of the 29th International Conference on Machine Learning, Curran Associates, Red Hook, NY, 2012, pp. 1571-1578.
[40] S. Ravishankar and Y. Bresler, Efficient blind compressed sensing using sparsifying transforms with convergence guarantees and application to magnetic resonance imaging, SIAM J. Imaging Sci., 8 (2015), pp. 2519-2557. · Zbl 1343.94013
[41] H. Robbins and S. Monro, A stochastic approximation method, Ann. of Math. Statist., 22 (1951), pp. 400-407. · Zbl 0054.05901
[42] G. O. Roberts and R. L. Tweedie, Exponential convergence of Langevin distributions and their discrete approximations, Bernoulli, 2 (1996), pp. 341-363, https://doi.org/10.2307/3318418. · Zbl 0870.60027
[43] L. Rosasco, S. Villa, and B. C. Vu͂, Convergence of stochastic proximal gradient algorithm, Appl. Math. Optim., 82 (2020), pp. 891-917. · Zbl 1465.90101
[44] C.-B. Schönlieb, Partial Differential Equation Methods for Image Inpainting, Cambridge Mongr. Appl. Comput. Math. 29, Cambridge University Press, 2015. · Zbl 1335.94002
[45] O. Shamir and T. Zhang, Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes, in International Conference on Machine Learning, Curran Associates, Red Hook, NY, 2013, pp. 71-79.
[46] M. Simo͂es, J. Bioucas-Dias, L. B. Almeida, and J. Chanussot, A convex formulation for hyperspectral image superresolution via subspace-based regularization, IEEE Trans. Geosci. Remote Sensing, 53 (2015), pp. 3373-3388.
[47] W. Su, S. P. Boyd, and E. J. Candès, A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights, J. Mach. Learn. Res., 17 (2016), 153, http://jmlr.org/papers/v17/15-084.html. · Zbl 1391.90667
[48] V. B. Tadić and A. Doucet, Asymptotic bias of stochastic gradient search, Ann. Appl. Probab., 27 (2017), pp. 3255-3304, https://doi.org/10.1214/16-AAP1272. · Zbl 1387.49044
[49] A. F. Vidal, V. De Bortoli, M. Pereyra, and D. Alain, Maximum likelihood estimation of regularisation parameters in high-dimensional inverse problems: An empirical Bayesian approach. Part I: Methodology and experiments, SIAM J. Imaging Sci., 13 (2020), pp. 1945-1989, https://doi.org/10.1137/20M1339829. · Zbl 1457.94014
[50] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, Residual dense network for image super-resolution, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, 2018, pp. 2472-2481.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.