×

Nesterov-aided stochastic gradient methods using Laplace approximation for Bayesian design optimization. (English) Zbl 1436.62372

Summary: Finding the best setup for experiments is the primary concern for Optimal Experimental Design (OED). Here, we focus on the Bayesian experimental design problem of finding the setup that maximizes the Shannon expected information gain. We use the stochastic gradient descent and its accelerated counterpart, which employs Nesterov’s method, to solve the optimization problem in OED. We adapt a restart technique, originally proposed for the acceleration in deterministic optimization, to improve stochastic optimization methods. We combine these optimization methods with three estimators of the objective function: the double-loop Monte Carlo estimator (DLMC), the Monte Carlo estimator using the Laplace approximation for the posterior distribution (MCLA) and the double-loop Monte Carlo estimator with Laplace-based importance sampling (DLMCIS). Using stochastic gradient methods and Laplace-based estimators together allows us to use expensive and complex models, such as those that require solving partial differential equations (PDEs). From a theoretical viewpoint, we derive an explicit formula to compute the gradient estimator of the Monte Carlo methods, including MCLA and DLMCIS. From a computational standpoint, we study four examples: three based on analytical functions and one using the finite element method. The last example is an electrical impedance tomography experiment based on the complete electrode model. In these examples, the accelerated stochastic gradient descent method using MCLA converges to local maxima with up to five orders of magnitude fewer model evaluations than gradient descent with DLMC.

MSC:

62K05 Optimal statistical designs
65C05 Monte Carlo methods
62L20 Stochastic approximation
62-08 Computational methods for problems pertaining to statistics
65N21 Numerical methods for inverse problems for boundary value problems involving PDEs

References:

[1] Chaloner, K.; Verdinelli, I., Bayesian experimental design: A review, Statist. Sci., 273-304 (1995) · Zbl 0955.62617
[2] Ryan, K. J., Estimating expected information gains for experimental designs with application to the random fatigue-limit model, J. Comput. Graph. Statist., 12, 3, 585-603 (2003)
[3] Huan, X., Accelerated Bayesian Experimental Design for Chemical Kinetic Models (2010), Massachusetts Institute of Technology, (Ph.D. thesis)
[4] Huan, X.; Marzouk, Y. M., Simulation-based optimal Bayesian experimental design for nonlinear systems, J. Comput. Phys., 232, 1, 288-317 (2013)
[5] Spall, J. C., A stochastic approximation algorithm for large-dimensional systems in the Kiefer-Wolfowitz setting, (Decision and Control, 1988., Proceedings of the 27th IEEE Conference on (1988), IEEE), 1544-1548
[6] Long, Q.; Scavino, M.; Tempone, R.; Wang, S., Fast estimation of expected information gains for Bayesian experimental designs based on Laplace approximations, Comput. Methods Appl. Mech. Engrg., 259, 24-39 (2013) · Zbl 1286.62068
[7] Huan, X.; Marzouk, Y., Gradient-based stochastic optimization methods in Bayesian experimental design, Int. J. Uncertain. Quantif., 4, 6, 1-41 (2014)
[8] Beck, J.; Dia, B. M.; Espath, L. F.R.; Long, Q.; Tempone, R., Fast Bayesian experimental design: Laplace-based importance sampling for the expected information gain, Comput. Methods Appl. Mech. Engrg., 334, 523-553 (2018) · Zbl 1440.62293
[9] Nemirovski, A., Efficient Methods in Convex Programming (2005), Technion
[10] Robbins, H.; Monro, S., A stochastic approximation method, Ann. Math. Stat., 400-407 (1951) · Zbl 0054.05901
[11] Nesterov, Y., A method of solving a convex programming problem with convergence rate o (1/k2), (Soviet Mathematics Doklady, Vol. 27 (1983)), 372-376 · Zbl 0535.90071
[12] O’Donoghue, B.; Candès, E., Adaptive restart for accelerated gradient schemes, Found. Comput. Math., 15, 3, 715-732 (2015) · Zbl 1320.90061
[13] Nitanda, A., Accelerated stochastic gradient descent for minimizing finite sums, (Artificial Intelligence and Statistics (2016)), 195-203
[14] Shannon, C. E., A mathematical theory of communication, Bell Syst. Tech. J., 27, 623-656 (1948) · Zbl 1154.94303
[15] Nelder, J. A.; Mead, R., A simplex method for function minimization, Comput. J., 7, 4, 308-313 (1965) · Zbl 0229.65053
[16] Kiefer, J.; Wolfowitz, J., Optimum designs in regression problems, Ann. Math. Stat., 271-294 (1959) · Zbl 0090.11404
[17] Lai, T. L.; Robbins, H., Adaptive design and stochastic approximation, Ann. Statist., 1196-1221 (1979) · Zbl 0426.62059
[18] Polyak, B. T.; Juditsky, A. B., Acceleration of stochastic approximation by averaging, SIAM J. Control Optim., 30, 4, 838-855 (1992) · Zbl 0762.62022
[19] Cotter, A.; Shamir, O.; Srebro, N.; Sridharan, K., Better mini-batch algorithms via accelerated gradient methods, (Advances in Neural Information Processing Systems (2011)), 1647-1655
[20] Nesterov, Y., Introductory Lectures on Convex Optimization: A Basic Course, Vol. 87 (2013), Springer Science & Business Media
[21] Rumelhart, D. E.; Hinton, G. E.; Williams, R. J., Learning representations by back-propagating errors, Nature, 323, 6088, 533 (1986) · Zbl 1369.68284
[22] Johnson, R.; Zhang, T., Accelerating stochastic gradient descent using predictive variance reduction, (Advances in Neural Information Processing Systems (2013)), 315-323
[23] Allen-Zhu, Z., Katyusha: The first direct acceleration of stochastic gradient methods, (Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (2017), ACM), 1200-1205 · Zbl 1369.68273
[24] Su, W.; Boyd, S.; Candès, E. J., A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights, J. Mach. Learn. Res., 17, 153, 1-43 (2016) · Zbl 1391.90667
[25] Timoshenko, S. P., LXVI. On the correction for shear of the differential equation for transverse vibrations of prismatic bars, Lond. Edinb. Dublin Philos. Mag. J. Sci., 41, 245, 744-746 (1921)
[26] Somersalo, E.; Cheney, M.; Isaacson, D., Existence and uniqueness for electrode models for electric current computed tomography, SIAM J. Appl. Math., 52, 1023-1040 (1992) · Zbl 0759.35055
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.