×

Bayesian optimization with expensive integrands. (English) Zbl 07510409

Summary: Nonconvex derivative-free time-consuming objectives are often optimized using “black-box” optimization. These approaches assume very little about the objective. While broadly applicable, they typically require more evaluations than methods exploiting more problem structure. Often, such time-consuming objectives are actually the sum or integral of a larger number of functions, each of which consumes significant time when evaluated individually. This arises in designing aircraft, choosing parameters in ride-sharing dispatch systems, and tuning hyperparameters in deep neural networks. We develop a novel Bayesian optimization algorithm that leverages this structure to improve performance. Our algorithm is average-case optimal by construction when a single evaluation of the integrand remains within our evaluation budget. Achieving this one-step optimality requires solving a challenging value of information optimization problem, for which we provide a novel efficient discretization-free computational method. We also prove consistency for our method in both continuum and discrete finite domains for objective functions that are sums. In numerical experiments comparing against previous state-of-the-art methods, including those that also leverage sum or integral structure, our method performs as well or better across a wide range of problems and offers significant improvements when evaluations are noisy or the integrand varies smoothly in the integrated variables.

MSC:

62F15 Bayesian inference
62K99 Design of statistical experiments
62L05 Sequential statistical design
62F07 Statistical ranking and selection procedures

References:

[1] R. Bardenet, M. Brendel, B. Kégl, and M. Sebag (2013), Collaborative hyperparameter tuning, in Proceedings of the 30th International Conference on Machine Learning (ICML 2013), Vol. 28, ACM, New York, pp. 199-207.
[2] J.-F. Barthelemy and R. Haftka (1993), Approximation concepts for optimum structural design-a review, Struct. Multidiscip. Optim., 5, pp. 129-144.
[3] R. G. Bartle (1966), The Elements of Integration, John Wiley & Sons, New York. · Zbl 0146.28201
[4] J. Bect, F. Bachoc, and D. Ginsbourger (2016), A Supermartingale Approach to Gaussian Process Based Sequential Design of Experiments, preprint, https://arxiv.org/abs/1608.01118. · Zbl 1428.62369
[5] E. V. Bonilla, K. M. Chai, and C. Williams (2007), Multi-task Gaussian process prediction, in Advances in Neural Information Processing Systems 20 (NIPS 2007), NeurIPS, San Diego, CA, pp. 153-160.
[6] A. Booker, J. Dennis, P. Frank, D. Serafini, and V. Torczon (1998), Optimization using surrogate objectives on a helicopter test example, in Computational Methods for Optimal Design and Control, Birkhäuser Boston, Boston, MA, pp. 49-58. · Zbl 0937.76067
[7] E. Brochu, V. M. Cora, and N. De Freitas (2010), A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning, preprint, https://arxiv.org/abs/1012.2599.
[8] A. D. Bull (2011), Convergence rates of efficient global optimization algorithms, J. Mach. Learn. Res., 12, pp. 2879-2904. · Zbl 1280.90094
[9] D. Cohn (1996), Neural networks exploration using optimal experimental design, Neural Networks, 6, pp. 1071-1083.
[10] J. Dennis and V. Torczon (1997), Managing approximation models in optimization, in Multidisciplinary Design Optimization (Hampton, VA, 1995), SIAM, Philadelphia, pp. 330-347.
[11] A. Forrester, A. Sobester, and A. Keane (2008), Engineering Design via Surrogate Modelling: A Practical Guide, John Wiley & Sons, New York.
[12] A. I. Forrester, A. Sóbester, and A. J. Keane (2007), Multi-fidelity optimization via surrogate modelling, Proc. R. Soc. Lond. A Math. Phys. Eng. Sci., 463, pp. 3251-3269. · Zbl 1142.90489
[13] P. Frazier, W. B. Powell, and S. Dayanik (2008), A knowledge-gradient policy for sequential information collection, SIAM J. Control Optim., 47, pp. 2410-2439, https://doi.org/10.1137/070693424. · Zbl 1274.62155
[14] P. Frazier, W. Powell, and S. Dayanik (2009), The knowledge-gradient policy for correlated normal beliefs, INFORMS J. Comput., 21, pp. 599-613. · Zbl 1243.91014
[15] P. I. Frazier (2018), Bayesian optimization, in Recent Advances in Optimization and Modeling of Contemporary Problems, INFORMS, Catonsville, MD, pp. 255-278.
[16] D. Ginsbourger and R. Le Riche (2010), Towards Gaussian process-based optimization with finite time horizon, in mODa 9-Advances in Model-Oriented Design and Analysis, Springer, New York, pp. 89-96.
[17] P. Glasserman (2004), Monte Carlo Methods in Financial Engineering, Appl. Math. (N.Y.) 53, Springer, New York. · Zbl 1038.91045
[18] P. Goovaerts (1997), Geostatistics for Natural Resources Evaluation, Oxford University Press on Demand, Oxford, UK.
[19] J. S. Gray, J. T. Hwang, J. R. R. A. Martins, K. T. Moore, and B. A. Naylor (2019), OpenMDAO: An open-source framework for multidisciplinary design, analysis, and optimization, Struct. Multidiscip. Optim., 59, pp. 1075-1104.
[20] P. Groot, A. Birlutiu, and T. Heskes (2010), Bayesian Monte Carlo for the global optimization of expensive functions, in Proceedings of the 19th European Conference on Artificial Intelligence (ECAI), Lisbon, Portugal, pp. 249-254. · Zbl 1211.65078
[21] S. G. Henderson and R. Pasupathy (2018), SimOpt, http://simopt.org/.
[22] R. Howard (1966), Information Value Theory, IEEE Trans. Systems Sci. Cybernetics, 2, pp. 22-26.
[23] F. Hutter, H. H. Hoos, and K. Leyton-Brown (2011), Sequential model-based optimization for general algorithm configuration, in Learning and Intelligent Optimization, Springer, New York, pp. 507-523.
[24] C. Jin, R. Ge, P. Netrapalli, S. Kakade, and M. Jordan (2017), How to Escape Saddle Points Efficiently, preprint, https://arxiv.org/abs/1703.00887.
[25] D. R. Jones, M. Schonlau, and W. J. Welch (1998), Efficient global optimization of expensive black-box functions, J. Global Optim., 13, pp. 455-492. · Zbl 0917.90270
[26] K. Kersting, C. Plagemann, P. Pfaff, and W. Burgard (2007), Most likely heteroscedastic Gaussian process regression, in Proceedings of the 24th International Conference on Machine Learning (ICML 2007), Vol. 28, ACM, New York, pp. 393-400.
[27] S. Kim and B. Nelson (2007), Recent advances in ranking and selection, in Proceedings of the 2007 Winter Simulation Conference, IEEE, Washington, DC, pp. 162-172.
[28] D. Kingma and J. Ba (2014), Adam: A Method for Stochastic Optimization, preprint, https://arxiv.org/abs/1412.6980.
[29] C. Lam (2008), Sequential Adaptive Designs in Computer Experiments for Response Surface Model Fit, Doctoral dissertation, The Ohio State University, Columbus, OH.
[30] J. Lehman, T. Santner, and W. Notz (2004), Designing computer experiments to determine robust control variables, Statist. Sinica, 14, pp. 571-590. · Zbl 1047.62121
[31] R. P. Liem, G. Kenway, and J. Martins (2014), Multimission aircraft fuel-burn minimization via multipoint aerostructural optimization, AIAA J., 53, pp. 104-122.
[32] M. Lukic and J. Beder (2001), Stochastic processes with sample paths in reproducing kernel Hilbert spaces, Trans. Amer. Math. Soc., 353, pp. 3945-3969. · Zbl 0973.60036
[33] S. Mahajan and G. van Ryzin (2001), Stocking retail assortments under dynamic consumer substitution, Oper. Res., 49, pp. 334-351. · Zbl 1163.90339
[34] J. Marzat, E. Walter, and H. Piet-Lahanier (2013), Worst-case global optimization of black-box functions through kriging and relaxation, J. Global Optim., 55, pp. 707-727. · Zbl 1268.90054
[35] P. Milgrom and I. Segal (2002), Envelope theorems for arbitrary choice sets, Econometrica, 70, pp. 583-601. · Zbl 1103.90400
[36] J. Mockus (1989), Bayesian Approach to Global Optimization: Theory and Applications, Kluwer Academic Press, Dordrecht, The Netherlands. · Zbl 0693.49001
[37] I. Motivate International (2015), Citi Bike website, http://www.citibikenyc.com/ (accessed May 2015).
[38] J. Mueller and C. A. Shoemaker (2014), Influence of ensemble surrogate models and sampling strategy on the solution quality of algorithms for computationally expensive black-box global optimization problems, J. Global Optim., 60, pp. 123-144. · Zbl 1312.90064
[39] K. P. Murphy (2012), Machine Learning: A Probabilistic Perspective, MIT Press, Cambridge, MA. · Zbl 1295.68003
[40] R. M. Neal (1997), Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification, preprint, https://arxiv.org/abs/physics/9701026.
[41] R. M. Neal (2003), Slice sampling, Ann. Statist., 31, pp. 705-767. · Zbl 1051.65007
[42] D. Newton, F. Yousefian, and R. Pasupathy (2018), Stochastic gradient descent: Recent trends, in Recent Advances in Optimization and Modeling of Contemporary Problems, INFORMS, Catonsville, MD, pp. 193-220.
[43] A. O’Hagan (1991), Bayes-Hermite quadrature, J. Statist. Plann. Inference, 29, pp. 245-260. · Zbl 0829.65024
[44] M. Poloczek, J. Wang, and P. I. Frazier (2016), Warm starting Bayesian optimization, in 2016 Winter Simulation Conference (WSC), IEEE, Washington, DC, pp. 770-781.
[45] M. Poloczek, J. Wang, and P. Frazier (2017), Multi-information source optimization, in Advances in Neural Information Processing Systems, NeurIPS, San Diego, CA, pp. 4291-4301.
[46] W. Powell and P. Frazier (2008), Optimal Learning, INFORMS, Catonsville, MD.
[47] C. Rasmussen and C. Williams (2006), Gaussian Processes for Machine Learning, MIT Press, Cambridge, MA. · Zbl 1177.68165
[48] J. Sacks, W. J. Welch, T. J. Mitchell, and H. P. Wynn (1989), Design and analysis of computer experiments, Statist. Sci., 4, pp. 409-435. · Zbl 0955.62619
[49] S. Sankaran and A. Marsden (2010), The impact of uncertainty on shape optimization of idealized bypass graft models in unsteady flow, Phys. Fluids, 22, 121902.
[50] M. Seeger, Y.-W. Teh, and M. Jordan (2005), Semiparametric Latent Factor Models, Technical report, EPFL report 161465.
[51] B. Shahriari, K. Swersky, Z. Wang, R. Adams, and N. De Freitas (2016), Taking the human out of the loop: A review of Bayesian optimization, Proc. IEEE, 104, pp. 148-175.
[52] J. Snoek, H. Larochelle, and R. P. Adams (2012), Practical Bayesian optimization of machine learning algorithms, in Advances in Neural Information Processing Systems, NeurIPS, San Diego, CA, pp. 2951-2959.
[53] K. Swersky, J. Snoek, and R. P. Adams (2013), Multi-task Bayesian optimization, in Advances in Neural Information Processing Systems, NeurIPS, San Diego, CA, pp. 2004-2012.
[54] S. Toscano-Palmerin (2019), Aircraft problem, https://github.com/toscanosaul/bayesian_quadrature_optimization/blob/master/problems/aircraft/fuel_burn.py.
[55] S. Toscano-Palmerin and P. I. Frazier (2016), Stratified Bayesian optimization, in Proceedings of the 2016 Conference on Monte Carlo and Quasi-Monte Carlo Methods (MCQMC), preprint, https://arxiv.org/abs/1602.02338. · Zbl 1411.90244
[56] B. J. Williams, T. J. Santner, and W. I. Notz (2000), Sequential design of computer experiments to minimize integrated response functions, Statist. Sinica, 10, pp. 1133-1152. · Zbl 0961.62069
[57] J. Wu, M. Poloczek, A. Wilson, and P. Frazier (2017), Bayesian optimization with gradients, in Advances in Neural Information Processing Systems, NeurIPS, San Diego, CA, pp. 5273-5284.
[58] J. Xie, P. Frazier, S. Sankaran, A. Marsden, and S. Elmohamed (2012), Optimization of computationally expensive simulations with Gaussian processes and parameter uncertainty: Application to cardiovascular surgery, in 50th Annual Allerton Conference on Communication, Control, and Computing, Allerton, IL, pp. 406-413.
[59] C. Zhu, R. Byrd, P. Lu, and J. Nocedal (1997), Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization, ACM Trans. Math. Software, 23, pp. 550-560. · Zbl 0912.65057
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.