×

Technical note – knowledge gradient for selection with covariates: consistency and computation. (English) Zbl 1533.91125

Summary: Knowledge gradient is a design principle for developing Bayesian sequential sampling policies to solve optimization problems. In this paper, we consider the ranking and selection problem in the presence of covariates, where the best alternative is not universal but depends on the covariates. In this context, we prove that under minimal assumptions, the sampling policy based on knowledge gradient is consistent, in the sense that following the policy the best alternative as a function of the covariates will be identified almost surely as the number of samples grows. We also propose a stochastic gradient ascent algorithm for computing the sampling policy and demonstrate its performance via numerical experiments.
{© 2021 Wiley Periodicals LLC.}

MSC:

91B06 Decision theory

Software:

Cornell-MOE

References:

[1] Adler, R. J., & Taylor, J. E. (2007). Random fields and geometry. Springer. · Zbl 1149.60003
[2] Ankenman, B., Nelson, B. L., & Staum, J. (2010). Stochastic kriging for simulation metamodeling. Operations Research, 58(2), 371-382. · Zbl 1342.62134
[3] Arora, N., Dreze, X., Ghose, A., Hess, J. D., Iyengar, R., Jing, B., Joshi, Y., Kumar, V., Lurie, N., Neslin, S., Sajeesh, S., Su, M., Syam, N., Thomas, J., & Zhang, Z. J. (2008). Putting one‐to‐one marketing to work: Personalization, customization, and choice. Marketing Letters, 19(3-4), 305-321.
[4] Bect, J., Bachoc, F., & Discourager, D. (2019). A supermartingale approach to Gaussian process based sequential design of experiments. Bernoulli, 25(4A), 2883-2919. · Zbl 1428.62369
[5] Bubeck, S., & Cesa‐Bianchi, N. (2012). Regret analysis of stochastic and nonstochastic multi‐armed bandit problems. Foundations and Trends in Machine Learning, 5(1), 1-122. · Zbl 1281.91051
[6] Chen, C.‐H., Chick, S. E., Lee, L. H., & Pujowidianto, N. A. (2015). Ranking and selection: Efficient simulation budget allocation. In M. C.Fu (ed.) (Ed.), Handbook of simulation optimization (pp. 45-80). Springer.
[7] Choi, S. E., Perzan, K. E., Tramontano, A. C., Kong, C. Y., & Hur, C. (2014). Statins and aspirin for chemoprevention in Barrett’s esophagus: Results of a cost‐effectiveness analysis. Cancer Prevention Research, 7(3), 341-350.
[8] Frazier, P., Powell, W., & Dayanik, S. (2009). The knowledge‐gradient policy for correlated normal beliefs. INFORMS Journal on Computing, 21(4), 599-613. · Zbl 1243.91014
[9] Frazier, P. I., Powell, W., & Dayanik, S. (2008). A knowledge gradient policy for sequential information collection. SIAM Journal on Control and Optimization, 47(5), 2410-2439. · Zbl 1274.62155
[10] Frazier, P. I., & Powell, W. B. (2011). Consistency of sequential Bayesian sampling policies. SIAM Journal on Control and Optimization, 49(2), 712-731. · Zbl 1284.62170
[11] Hu, R., & Ludkovski, M. (2017). Sequential design for ranking response surfaces. SIAM/ASA Journal on Uncertainty Quantification, 5(1), 212-239. · Zbl 1365.62319
[12] Hur, C., Nishioka, N. S., & Gazelle, G. S. (2004). Cost‐effectiveness of aspirin chemoprevention for Barrett’s esophagus. Journal of the National Cancer Institute, 96(4), 316-325.
[13] Kim, E. S., Herbst, R. S., Wistuba, I. I., Lee, J. J., Blumenschein, G. R., Jr., Tsao, A., Stewart, D. J., Hicks, M. E., Erasmus, J., Jr., Gupta, S., Alden, C. M., Liu, S., Tang, X., Khuri, F. R., Tran, H. T., Johnson, B. E., Heymach, J. V., Mao, L., Fossella, F., … Hong, W. K. (2011). The BATTLE trial: Personalizing therapy for lung cancer. Cancer Discovery, 1(1), 44-53.
[14] Kim, S.‐H., & Nelson, B. L. (2006). Selecting the best system. In S. G.Henderson (ed.) & B. L.Nelson (ed.) (Eds.), Handbooks in operations research and management science (Vol. 13, pp. 501-534). Elsevier. · Zbl 1170.90300
[15] Krause, A., & Ong, C. S. (2011). Contextual Gaussian process bandit optimization. In J.Shawe‐Taylor (ed.), R. S.Zemel (ed.), P. L.Bartlett (ed.), F.Pereira (ed.), & K. Q.Weinberger (ed.) (Eds.), Advances in neural information processing systems. (Vol. 24, pp. 2447-2455). Curran Associates, Inc.
[16] Kushner, H. J., & Yin, G. G. (2003). Stochastic approximation and recursive algorithms and applications. Springer‐Verlag. · Zbl 1026.62084
[17] L’Ecuyer, P. (1995). Note: On the interchange of derivative and expectation for likelihood ratio derivative estimators. Management Science, 41(4), 738-747. · Zbl 0844.62003
[18] Mes, M. R., Powell, W. B., & Frazier, P. I. (2011). Hierarchical knowledge gradient for sequential sampling. Journal of Machine Learning Research, 12, 2931-2974. · Zbl 1280.68184
[19] Newton, D., Yousefian, F., & Pasupathy, R. (2018). Stochastic gradient descent: Recent trends. In E.Gel (ed.) & D.Lewis (ed.) (Eds.), TutORials in operations research (Vol. 7, pp. 193-220) INFORMS. Institute for Operations Research and the Management Sciences (INFORMS).
[20] Pearce, M., & Branke, J. (2017). Efficient expected improvement estimation for continuous multiple ranking and selection. In Chan W. K. V., D’Ambrogio A., Zacharewicz G, Mustafee N., Wainer G., Page E., Proceedings of the 2017 winter simulation conference (pp. 2161-2172). IEEE Press.
[21] Pearce, M., & Branke, J. (2018). Continuous multi‐task Bayesian optimisation with correlation. European Journal of Operational Research, 270(3), 1074-1085. · Zbl 1403.90555
[22] Perchet, V., & Rigollet, P. (2013). The multi‐armed bandit problem with covariates. The Annals of Statistics, 41(2), 693-721. · Zbl 1360.62436
[23] Poloczek, M., Wang, J., & Frazier, P. I. (2017). Multi‐information source optimization. In I.Guyon (ed.), U. V.Luxburg (ed.), S.Bengio (ed.), H.Wallach (ed.), R.Fergus (ed.), S.Vishwanathan (ed.), & R.Garnett (ed.) (Eds.), Advances in neural information processing systems (Vol. 30, pp. 4288-4298). Curran Associates, Inc.
[24] Polyak, B. T., & Juditsky, A. B. (1992). Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4), 838-855. · Zbl 0762.62022
[25] Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning. MIT Press. · Zbl 1177.68165
[26] Rusmevichientong, P., & Tsitsiklis, J. N. (2010). Linearly parameterized bandits. Mathematics of Operations Research, 35(2), 395-411. · Zbl 1217.93190
[27] Ryzhov, I. O. (2016). On the convergence rates of expected improvement methods. Operations Research, 64(6), 1515-1528. · Zbl 1359.62519
[28] Scott, W., Frazier, P., & Powell, W. (2011). The correlated knowledge gradient for simulation optimization of continuous parameters using Gaussian process regression. SIAM Journal on Optimization, 21(3), 996-1026. · Zbl 1229.62018
[29] Shen, H., Hong, L. J., & Zhang, X. (2021). Ranking and selection with covariates for personalized decision making. INFORMS Journal on Computing (Forthcoming). https://doi.org/10.1287/ijoc.2020.1009. · Zbl 07549347 · doi:10.1287/ijoc.2020.1009
[30] Steinwart, I., & Christmann, A. (2008). Support vector machines. Springer. · Zbl 1203.68171
[31] Toscano‐Palmerin, S., & Frazier, P. I. (2018). Bayesian optimization with expensive integrands. arXiv:1803.08661. · Zbl 07510409
[32] Wu, J., & Frazier, P. I. (2016). The parallel knowledge gradient method for batch Bayesian optimization. In D. D.Lee (ed.), M.Sugiyama (ed.), U. V.Luxburg (ed.), I.Guyon (ed.), & R.Garnett (ed.) (Eds.), Advances in neural information processing systems 29 (pp. 3126-3134). Curran Associates, Inc.
[33] Wu, J., Poloczek, M., Wilson, A. G., & Frazier, P. I. (2017). Bayesian optimization with gradients. In I.Guyon (ed.), U. V.Luxburg (ed.), S.Bengio (ed.), H.Wallach (ed.), R.Fergus (ed.), S.Vishwanathan (ed.), & R.Garnett (ed.) (Eds.), Advances in neural information processing systems 30 (pp. 5267-5278). Curran Associates, Inc.
[34] Xie, J., Frazier, P. I., & Chick, S. E. (2016). Bayesian optimization via simulation with pairwise sampling and correlated prior beliefs. Operations Research, 64(2), 542-559. · Zbl 1342.90107
[35] Yang, Y., & Zhu, D. (2002). Randomized allocation with nonparametric estimation for a multi‐armed bandit problem with covariates. The Annals of Statistics, 30(1), 100-121. · Zbl 1012.62088
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.