Document Zbl 07765843

Explainable subgradient tree boosting for prescriptive analytics in operations management. (English) Zbl 07765843

Eur. J. Oper. Res. 312, No. 3, 1119-1133 (2024).

Summary: Motivated by the success of gradient boosting approaches in machine learning and driven by the need for explainable prescriptive analytics approaches in operations management (OM), we propose subgradient tree boosting (STB) as an explainable prescriptive analytics approach to solving convex stochastic optimization problems that frequently arise in OM. The STB approach combines the well-known method of subgradient descent in function space with sample average approximation, and prescribes decisions from a problem-specific loss function, historical demand observations, and prescriptive features. The approach provides a decision-maker with detailed explanations for the prescribed decisions, such as a breakdown of individual features’ impact. These explanations are particularly valuable in practice when the decision-maker has the discretion to adjust the recommendations made by a decision support system. We show how subgradients can be derived for common single-stage and two-stage stochastic optimization problems; demonstrate the STB approach’s applicability to two real-world, complex capacity-planning problems in the service industry; benchmark the STB approach’s performance against those of two prescriptive approaches – weighted sample average approximation (wSAA) and kernelized empirical risk minimization (kERM); and show how the STB approach’s prescriptions can be explained by estimating the impact of individual features. The results suggest that the quality of the STB approach’s prescriptions is comparable to that of wSAA’s and kERM’s prescriptions while also providing explanations.

MSC:

90Bxx

Operations research and management science

Keywords:

decision support systems; prescriptive analytics; machine learning; gradient boosting; explainability

Software:

XGBoost; AdaBoost.MH; shap

Cite Review PDF

Full Text: DOI

References:

[1]	Aas, K.; Jullum, M.; Løland, A., Explaining individual predictions when features are dependent: More accurate approximations to shapley values, Artificial Intelligence, 298, 103502 (2021) · Zbl 1520.68136
[2]	Ban, G.-Y.; Rudin, C., The big data newsvendor: Practical insights from machine learning, Operations Research, 67, 1, 90-108 (2019) · Zbl 1443.90093
[3]	Bastani, O.; Kim, C.; Bastani, H., Interpreting blackbox models via model extraction, arXiv preprint arXiv:1705.08504 (2017)
[4]	Bazaraa, M. S.; Sherali, H. D.; Shetty, C. M., Nonlinear programming: Theory and algorithms, Wiley-Interscience series in discrete mathematics and optimization (1993), Wiley: Wiley New York · Zbl 0774.90075
[5]	Bertsekas, D. P., Nonlinear programming (1995), Athena Scientific: Athena Scientific Belmont, MA · Zbl 0935.90037
[6]	Bertsimas, D.; Delarue, A.; Jaillet, P.; Martin, S., The price of interpretability, arXiv preprint arXiv:1907.03419 (2019)
[7]	Bertsimas, D.; Kallus, N., From predictive to prescriptive analytics, Management Science, 66, 3, 1025-1044 (2020)
[8]	Bertsimas, D.; McCord, C., From predictions to prescriptions in multistage optimization problems, arXiv preprint arXiv:1904.11637 (2019)
[9]	Biau, G.; Cadre, B., Optimization by gradient boosting, (Daouia, A.; Ruiz-Gazen, A., Advances in contemporary statistics and econometrics: Festschrift in honor of christine thomas-agnan (2021), Springer Int. Publishing: Springer Int. Publishing Cham), 23-44 · Zbl 07645393
[10]	Biau, G.; Scornet, E., A random forest guided tour, TEST, 25, 2, 197-227 (2016) · Zbl 1402.62133
[11]	Bickel, P. J.; Ritov, Y.; Zakai, A., Some theory for generalized boosting algorithms, Journal of Machine Learning Research, 7, 705-732 (2006) · Zbl 1222.68148
[12]	Bolton, G. E.; Ockenfels, A.; Thonemann, U. W., Managers and students as newsvendors, Management Science, 58, 12, 2225-2233 (2012)
[13]	Boyd, S., Duchi, J., & Vandenberghe, L. (2018). Subgradients: Notes for EE364b, Stanford University, Spring 2014-2015. https://web.stanford.edu/class/ee364b/lectures/subgradients_notes.pdf.
[14]	Boyd, S.; Vandenberghe, L., Convex optimization (2004), Cambridge University Press: Cambridge University Press New York · Zbl 1058.90049
[15]	Bravo, F.; Shaposhnik, Y., Mining optimal policies: A pattern recognition approach to model analysis, INFORMS Journal on Optimization, 2, 3, 145-166 (2020)
[16]	Bühlmann, P.; Hothorn, T., Boosting algorithms: Regularization, prediction and model fitting, Statistical Science, 22, 4, 477-505 (2007) · Zbl 1246.62163
[17]	Burkart, N.; Huber, M. F., A survey on the explainability of supervised machine learning, Journal of Artificial Intelligence Research, 70, 245-317 (2021) · Zbl 1497.68412
[18]	Chen, T.; Guestrin, C., XGBoost: A scalable tree boosting system, Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining - KDD ’16, 785-794 (2016), ACM Press: ACM Press New York, NY
[19]	Ciocan, D. F.; Mišić, V. V., Interpretable optimal stopping, Management Science, 68, 3, 1616-1638 (2022)
[20]	Dietvorst, B. J.; Simmons, J. P.; Massey, C., Algorithm aversion: People erroneously avoid algorithms after seeing them err, Journal of Experimental Psychology. General, 144, 1, 114-126 (2015)
[21]	Dietvorst, B. J.; Simmons, J. P.; Massey, C., Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them, Management Science, 64, 3, 1155-1170 (2018)
[22]	Doshi-Velez, F.; Kim, B., Towards a rigorous science of interpretable machine learning, arXiv preprint arXiv:1702.08608 (2017)
[23]	Elmachtoub, A. N.; Grigas, P., Smart ǣpredict, then optimizeǥ, Management Science, 68, 1, 9-26 (2022)
[24]	Elmachtoub, A. N.; Liang, J. C.N.; Mcnellis, R., Decision trees for decision-making under the predict-then-optimize framework, (DaumȨ, I. H.; Singh, A., Proceedings of the 37th international conference on machine learning. Proceedings of the 37th international conference on machine learning, Proceedings of Machine Learning Research, vol. 119 (2020), PMLR), 2858-2867
[25]	Fildes, R.; Goodwin, P.; Lawrence, M.; Nikolopoulos, K., Effective forecasting and judgmental adjustments: An empirical evaluation and strategies for improvement in supply-chain planning, International Journal of Forecasting, 25, 1, 3-23 (2009)
[26]	Freund, Y.; Schapire, R. E., A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, 55, 1, 119-139 (1997) · Zbl 0880.68103
[27]	Friedman, J. H., Greedy function approximation: A gradient boosting machine, The Annals of Statistics, 29, 5, 1189-1232 (2001) · Zbl 1043.62034
[28]	Friedman, J. H., Stochastic gradient boosting, Computational Statistics & Data Analysis, 38, 4, 367-378 (2002) · Zbl 1072.65502
[29]	Gilpin, L. H.; Bau, D.; Yuan, B. Z.; Bajwa, A.; Specter, M.; Kagal, L., Explaining explanations: An overview of interpretability of machine learning, 2018 IEEE 5th international conference on data science and advanced analytics (DSAA), 80-89 (2018), IEEE
[30]	ICML’11
[31]	Imdahl, C.; Hoberg, K.; Schmidt, W., Targeted automation of order decisions using machine learning, SSRN Electronic Journal (2021)
[32]	Kesavan, S.; Kushwaha, T., Field experiment on the profit implications of merchants’ discretionary power to override data-driven decision-making tools, Management Science, 66, 11, 5182-5190 (2020)
[33]	Lee, Y. S.; Siemsen, E., Task decomposition and newsvendor decision making, Management Science, 63, 10, 3226-3245 (2017)
[34]	Lundberg, S. M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J. M.; Nair, B.; Lee, S.-I., From local explanations to global understanding with explainable AI for trees, Nature Machine Intelligence, 2, 1, 56-67 (2020)
[35]	Lundberg, S. M.; Lee, S.-I., A unified approach to interpreting model predictions, Proceedings of the 31st international conference on neural information processing systems. Proceedings of the 31st international conference on neural information processing systems, NIPS’17, 4768-4777 (2017), Curran Associates Inc.: Curran Associates Inc. Red Hook, NY
[36]	Marcinkevičs, R., & Vogt, J. E. (2020). Interpretability and explainability: A machine learning zoo mini-tour. arXiv preprint arXiv:2012.01805,.
[37]	Mason, L.; Baxter, J.; Bartlett, P. L.; Frean, M., Boosting algorithms as gradient descent, Proceedings of the 12th international conference on neural information processing systems. Proceedings of the 12th international conference on neural information processing systems, NIPS’99, 512-518 (1999), MIT Press: MIT Press Cambridge, MA
[38]	Meir, R.; Rätsch, G., An introduction to boosting and leveraging, (Mendelson, S.; Smola, A. J., Advanced lectures on machine learning: Machine learning summer school 2002 canberra, australia, february 11-22, 2002 revised lectures (2003), Springer: Springer Berlin, Heidelberg), 118-183 · Zbl 1019.68092
[39]	Miller, T., Explanation in artificial intelligence: Insights from the social sciences, Artificial Intelligence, 267, 1-38 (2019) · Zbl 1478.68274
[40]	Molnar, C., Interpretable machine learning: A guide for making black box models explainable (2022), Christoph Molnar: Christoph Molnar Munich
[41]	Netessine, S.; Dobson, G.; Shumsky, R. A., Flexible service capacity: Optimal investment and the impact of demand correlation, Operations Research, 50, 2, 375-388 (2002) · Zbl 1163.90343
[42]	Notz, P. M.; Pibernik, R., Prescriptive analytics for flexible capacity management, Management Science, 68, 3, 1756-1775 (2021)
[43]	Notz, P. M.; Wolf, P. K.; Pibernik, R., Prescriptive analytics for a multi-shift staffing problem, European Journal of Operational Research, 305, 2, 887-901 (2023) · Zbl 1541.90231
[44]	Prahl, A.; van Swol, L., Understanding algorithm aversion: When is advice from automation discounted?, Journal of Forecasting, 36, 6, 691-702 (2017)
[45]	Ratliff, N.; Bagnell, J. A.; Srinivasa, S. S., Imitation learning for locomotion and manipulation, 2007 7th IEEE-RAS international conference on humanoid robots, 392-397 (2007), IEEE
[46]	Ratliff, N.; Bradley, D.; Bagnell, J. A.; Chestnutt, J., Boosting structured prediction for imitation learning, Proceedings of the 19th international conference on neural information processing systems. Proceedings of the 19th international conference on neural information processing systems, NIPS’06, 1153-1160 (2006), MIT Press: MIT Press Cambridge, MA
[47]	Ratliff, N. D.; Silver, D.; Bagnell, J. A., Learning to search: Functional gradient techniques for imitation learning, Autonomous Robots, 27, 1, 25-53 (2009)
[48]	Ribeiro, M. T.; Singh, S.; Guestrin, C., “Why Should I Trust You?”: Explaining the predictions of any classifier, Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16, 1135-1144 (2016), Association for Computing Machinery: Association for Computing Machinery New York, NY, USA
[49]	Rockafellar, R. T., Convex analysis, Princeton Mathematical Series, volume 28 (1970), Princeton University Press: Princeton University Press Princeton, NJ · Zbl 0229.90020
[50]	Rudin, C., Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead, Nature Machine Intelligence, 1, 5, 206-215 (2019)
[51]	Sandulescu, V.; Chiru, M., Predicting the future relevance of research institutions - The winning solution of the KDD cup 2016, arXiv preprint arXiv:1609.02728 (2016)
[52]	Schapire, R. E., The strength of weak learnability, Machine Learning, 5, 2, 197-227 (1990)
[53]	Schapire, R. E.; Freund, Y., Boosting: Foundations and algorithms, Adaptive computation and machine learning (2012), MIT Press: MIT Press Cambridge, MA · Zbl 1278.68021
[54]	Senoner, J.; Netland, T.; Feuerriegel, S., Using explainable artificial intelligence to improve process quality: Evidence from semiconductor manufacturing, Management Science, 68, 8, 5704-5723 (2022)
[55]	Shapiro, A., Monte carlo sampling approach to stochastic programming, ESAIM: Proceedings, 13, 65-73 (2003) · Zbl 1053.90103
[56]	Shapiro, A.; Kleywegt, A., Minimax analysis of stochastic problems, Optimization Methods and Software, 17, 3, 523-542 (2002) · Zbl 1040.90030
[57]	Studniarski, M., An algorithm for calculating one subgradient of a convex function of two variables, Numerische Mathematik, 55, 6, 685-693 (1989) · Zbl 0671.65044
[58]	Volkovs, M.; Yu, G. W.; Poutanen, T., Content-based neighbor models for cold start in recommender systems, Proceedings of the recommender systems challenge 2017. Proceedings of the recommender systems challenge 2017, RecSys Challenge ’17, 1-6 (2017), ACM Press: ACM Press New York, NY
[59]	Wolpert, D. H., The lack of a priori distinctions between learning algorithms, Neural Computation, 8, 7, 1341-1390 (1996)
[60]	Zhang, T.; Yu, B., Boosting with early stopping: Convergence and consistency, The Annals of Statistics, 33, 4, 1538-1579 (2005) · Zbl 1078.62038

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.