×

Altering Gaussian process to Student-\(t\) process for maximum distribution construction. (English) Zbl 07471185

Summary: Gaussian process (GP) regression is widely used to find the extreme of a black-box function by iteratively approximating an objective function when new evaluation obtained. Such evaluation is usually made by optimising a given acquisition function. However, for non-parametric Bayesian optimisation, the extreme of the objective function is not a deterministic value, but a random variant with distribution. We call such distribution the maximum distribution which is generally non-analytical. To construct such maximum distribution, traditional GP regression method by optimising an acquisition function is computational cost as the GP model has a cubic computation of training data. Moreover, the introduction of acquisition function brings extra hyperparameters which made the optimisation more complicated. Recently, inspired by the idea of Sequential Monte Carlo method and its application in Bayesian optimisation, a Monte Carlo alike method is proposed to approximate the maximum distribution with weighted samples. Alternative to the method on GP model, we construct the maximum distribution within the framework of Student-\(t\) process (TP) which considers more uncertainties from the training data. Toy examples and real data experiment show the TP-based Monte Carlo maximum distribution has a competitive performance to the GP-based method.

MSC:

62-XX Statistics

Software:

Spearmint
Full Text: DOI

References:

[1] Aftab, W.; Freitas, A. D.; Mihaylova, L., A Gaussian process convolution particle filter for multiple extended objects tracking with non-regular shapes, 1-8 (2018)
[2] Banerjee, A.; Dunson, D.; Tokdar, S., Efficient Gaussian process regression for large data sets, Biometrika, 100, 1, 75-89 (2013) · Zbl 1284.62257
[3] Barsce, J. C., Palombarini, J. A., & Martínez, E. C. (2018). Towards Autonomous Reinforcement Learning: Automatic Setting of Hyper-parameters using Bayesian Optimization. https://arxiv.org/abs/1805.04748
[4] Belyaev, M.; Burnaev, E.; Kapushev, Y., Computationally efficient algorithm for Gaussian process regression in case of structured samples, Computational Mathematics and Mathematical Physics, 56, 4, 499-513 (2016) · Zbl 1403.62141
[5] Bijl, H., Schön, T. B., Wingerden, J.-W. v., & Verhaegen, M. (2017). A sequential Monte Carlo approach to Thompson sampling for Bayesian optimization. https://arxiv.org/abs/1604.00169v3
[6] Bijl, H.; Wingerden, J.-W. v.; Schon, T. B.; Verhaegen, M., Online sparse Gaussian process regression using FITC and PITC approximations, Proceedings of the IFAC Symposium on System Identification, 48, 28, 703-708 (2015)
[7] Brochu, E., Cora, V. M., & Freitas, N. d. (2010). A tutorial on Bayesian optimization of expensive cost functions with application to active user modeling and hierarchical reinforcement learning. https://arXiv.org/abs/1012.2599
[8] Cao, Y.; Brubaker, M. A.; Fleet, D. J.; Hertzmann, A., Efficient optimization for sparse Gaussian process regression, IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 12, 2415-2427 (2015)
[9] Chen, Z.; Wang, B., How priors of initial hyperparameters affect Gaussian process regression models, Neurocomputing, 275, 1702-1710 (2018)
[10] Chen, Z.; Wang, B.; Gorban, A. N., Multivariate Gaussian and Student-t process regression for multi-output prediction, Neural Computing and Applications, 32, 8, 3005-3028 (2020)
[11] Dahlin, J.; Lindsten, F., Particle filter-based Gaussian process optimisation for parameter inference, Proceedings of the 19th World Congress on the International Federation of Automatic Control, 47, 3, 8675-8680 (2014)
[12] Das, S.; Roy, S.; Sambasivan, R., Fast Gaussian process regression for Big data, Big Data Research, 14, 12-26 (2018)
[13] De Vito, S.; Massera, E.; Piga, M.; Martinotto, L.; Di Francia, G., On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario, Sensors and Actuators B: Chemical, 129, 2, 750-757 (2008)
[14] Doucet, A.; Freitas, N. d.; Gordon, N.; Doucet, A.; Freitas, N. d.; Gordon, N., Sequential Monte Carlo methods in practice, An introduction to Sequential Monte Carlo methods, 3-14 (2001), Springer-Verlag · Zbl 1056.93576
[15] Doucet, A.; Johansen, A. M.; Crisan, D.; Rozovskii, B., The Oxford Handbook of nonlinear filtering, A tutorial on particle filtering and smoothing: Fifteen years later, 656-704 (2011), Oxford University Press · Zbl 1513.60043
[16] Feurer, M.; Letham, B.; Bakshy, E., Scalable Meta-Learning for Bayesian Optimization (2018)
[17] Gordon, N. J.; Salmond, D. J.; Smith, A. F. M., Novel approach to nonlinear/non-Gaussian Bayesian state estimation, IEE Proceedings F - Radar and Signal Processing, 140, 2, 107-113 (1993)
[18] Grill, J.-B.; Valko, M.; Munos, R., NIPS 2015 Workshop (2015)
[19] Gutmann, M. U.; Corander, J., Bayesian optimization for likelihood-free inference of simulator-based Statistical models, Journal of Machine Learning Research, 17, 125, 1-47 (2016) · Zbl 1392.62072
[20] Hansen, N.; Ostermeier, A., Completely derandomized self-adaptation in evolution strategies, Evolutionary Computation, 9, 2, 159-195 (2001)
[21] Hennig, P.; Schuler, C., Entropy search for information-efficient global optimization, JMLR, 13, 57, 1809-1837 (2012) · Zbl 1432.65073
[22] Hernández-Lobato, J.; Hoffman, M.; Ghahramani, Z., Predictive entropy search for efficient global optimization of black-box functions, NIPS, 1, 918-926 (2014)
[23] Homan, M.; Brochu, E.; Freitas, N. d., Portfolio allocation for Bayesian optimization, 327-336 (2011)
[24] Ilievski, I.; Akhtar, T.; Feng, J.; Shoemaker, C. A., Efficient hyperparameter optimization of deep learning algorithms using deterministic RBF Surrogates (2017)
[25] Jones, D. R.; Perttunen, C. D.; Stuckman, B. E., Lipschitzian optimization without the Lipschitz constant, Journal of Optimization Theory and Applications, 79, 1, 157-181 (1993) · Zbl 0796.49032
[26] Kushner, H. J., A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise, Journal of Basic Engineering, 86, 1, 97-106 (1964)
[27] Liu, B.; Cheng, S.; Shi, Y., Particle filter optimization: A brief introduction, International Conference on Swarm Intelligence - Advances in Swarm Intelligence, Part I, 95-104 (2016)
[28] Lorraine, J., & Duvenaud, D. (2018). Stochastic Hyperparameter Optimization through Hypernetworks. CoRR. https://arxiv.org/abs/1802.09419
[29] Mockus, J., Bayesian approach to global optimization (1989), Springer · Zbl 0693.49001
[30] Mockus, J.; Tiesis, V.; Zilinskas, A.; Dixon, L. C. W.; Szego, G. P., Towards Global Optimization 2, The application of Bayesian methods for seeking the extremum, 117-129 (1978), North-Holand · Zbl 0394.90090
[31] Munos, R., Optimistic optimization of deterministic functions without the knowledge of its smoothness (2011)
[32] Neerukatti, R. K.; Fard, M. Y.; Chattopadhyay, A., Gaussian process-based particle-filtering approach for real-time damage prediction with application, Journal of Aerospace Engineering, 30, 1 (2017)
[33] Nickisch, H.; Solin, A.; Grigorievskiy, A., State space Gaussian processes with Non-Gaussian likelihood, Proceedings of the 35th International Conference on Machine Learning, PMLR, 80, 3789-3798 (2018)
[34] Nyikosa, F. M., Osborne, M. A., & Roberts, S. J. (2018). Bayesian Optimization for Dynamic Problems. https://arxiv.org/abs/1803.03432
[35] Pautrat, R.; Chatzilygeroudis, K.; Mouret, J.-B., Bayesian optimization with Automatic prior selection for data-efficient direct policy search (2018) · doi:10.1109/ICRA.2018.8463197
[36] Rasmussen, C. E.; Williams, C. K. I., Gaussian processes for machine learning (2006), MIT Press · Zbl 1177.68165
[37] Schön, T. B.; Svensson, A.; Murray, L.; Lindsten, F., Probabilistic learning of nonlinear dynamical systems using sequential Monte Carlo, Mechanical Systems and Signal Processing, 104, 866-883 (2018)
[38] Seiferth, D.; Chowdhary, G.; Mühlegg, M.; Holzapfel, F., Online Gaussian process regression with non-Gaussian likelihood (2017) · doi:10.23919/ACC.2017.7963429
[39] Shah, A.; Wilson, A.; Ghahramani, Z., Student-t processes as alternatives to Gaussian processes, Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, PMLR, 33, 877-885 (2014)
[40] Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R. P.; Freitas, N. d., Taking the human out of the loop: A review of Bayesian optimization, Proceedings of the IEEE, 104, 1, 184-175 (2016)
[41] Snoek, J.; Larochelle, H.; Adams, R. P., Practical Bayesian optimization of machine learning algorithms, Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS 2012), 2, 2951-2959 (2012)
[42] Solin, A.; Särkkä, S., State space methods for efficient inference in Student-t process regression (2015)
[43] Srinivas, N.; Krause, A.; Kakade, S.; Seeger, M., Gaussian process optimization in the bandit setting: No regret and experimental design (2010)
[44] Svensson, A.; Dahlin, J.; Schön, T. B., Marginalizing Gaussian process hyperparameters using sequential Monte Carlo (2015) · doi:10.1109/CAMSAP.2015.7383840
[45] Svensson, A.; Solin, A.; Särkkä, S.; Schön, T. B., Computationally efficient Bayesian learning of Gaussian process state space models, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR, 51, 213-221 (2016)
[46] Tang, Q.; Niu, L.; Wang, Y.; Dai, T.; An, W.; Cai, J.; Xia, S.-T., Student-t process regression with Student-t likelihood (2017) · doi:10.24963/ijcai.2017/393
[47] Tracey, B. D.; Wolpert, D. H., Upgrading from Gaussian Processes to Student’s-T Processes (2018) · doi:10.2514/6.2018-1659
[48] Valko, M.; Carpentier, A.; Munos, R., Stochastic simultaneous optimistic optimization, Proceedings of the 30th International Conference on Machine Learning, PMLR, 28, 2, 19-27 (2013)
[49] Wang, Z.; Shakibi, B.; Jin, L.; Freitas, N. d., Bayesian multi-scale optimistic optimization, Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, PMLR, 33, 1005-1014 (2014)
[50] Xiong, X.; Smıdl, V.; Filippone, M., Adaptive multiple importance sampling for Gaussian processes, Journal of Statistical Computation and Simulation, 87, 8, 1644-1667 (2017) · Zbl 07192021
[51] Zhang, T.; Zhao, Q.; Shin, K.; Nakamoto, Y., Bayesian-optimization-based peak searching algorithm for clustering in wireless sensor networks, Journal of Sensor and Actuator Networks, 7, 1, 2 (2018)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.