×

Risk-sensitive control of Markov decision processes: a moment-based approach with target distributions. (English) Zbl 1458.90401

Summary: In many revenue management applications risk-averse decision-making is crucial. In dynamic settings, however, it is challenging to find the right balance between maximizing expected rewards and minimizing various kinds of risk. In existing approaches utility functions, chance constraints, or (conditional) value at risk considerations are used to influence the distribution of rewards in a preferred way. Nevertheless, common techniques are not flexible enough and typically numerically complex. In our model, we exploit the fact that a distribution is characterized by its mean and higher moments. We present a multi-valued dynamic programming heuristic to compute risk-sensitive feedback policies that are able to directly control the moments of future rewards. Our approach is based on recursive formulations of higher moments and does not require an extension of the state space. Finally, we propose a self-tuning algorithm, which allows to identify feedback policies that approximate predetermined (risk-sensitive) target distributions. We illustrate the effectiveness and the flexibility of our approach for different dynamic pricing scenarios.

MSC:

90B50 Management decision making, including multiple objectives
90C40 Markov and semi-Markov decision processes
91B06 Decision theory
Full Text: DOI

References:

[1] Artzner, P.; Delbaen, F.; Eber, J.-M.; Heath, D., Coherent measures of risk, Math. Finance, 9, 203-228 (1999) · Zbl 0980.91042
[2] Baeuerle, N.; Rieder, U., More risk-sensitive markov decision processes, Math. Oper. Res., 39, 1, 105-120 (2014) · Zbl 1291.90289
[3] Barz, C.; Waldmann, K.-H., Risk-sensitive capacity control in revenue management, Math. Methods Oper. Res., 65, 3, 565-579 (2007) · Zbl 1139.91014
[4] Bitran, G.; Caldentey, R., An overview of pricing models for revenue management, Manuf. Service Oper. Manage., 5, 203-229 (2003)
[5] Bjoerk, T.; Murgoci, A.; Zhou, X. Y., Mean variance portfolio optimization with state dependent risk aversion, Math. Finance, 24, 1, 1-24 (2012) · Zbl 1285.91116
[6] Chen, M.; Chen, Z.-L., Recent developments in dynamic pricing research: multiple products, competition, and limited demand information, Prod. Oper. Manage., 24, 5, 704-731 (2015)
[7] Chen, X.; Sim, M.; Simchi-Levi, D.; Sun, P., Risk aversion in inventory management, Oper. Res., 55, 828-842 (2007) · Zbl 1167.90317
[8] Chiang, W.; Chen, J.; Xu, X., An overview of research on revenue management, Int. J. Revenue Manage., 1, 97-128 (2007)
[9] Den Boer, A. V., Dynamic pricing and learning: Historical origins, current research, and new directions, Surveys Oper. Res. Manage. Sci., 20, 1, 1-18 (2015)
[10] Feng, Y.; Xiao, B., Maximizing revenues of perishable assets with a risk factor, Oper. Res., 47, 2, 337-341 (1999) · Zbl 1005.90539
[11] Feng, Y.; Xiao, B., A risk-sensitive model for managing perishable products, Oper. Res., 56, 5, 1305-1311 (2008) · Zbl 1167.90378
[12] Goensch, J., A survey on risk-averse and robust revenue management, Eur. J. Oper. Res., 263, 2, 337-348 (2017) · Zbl 1380.91093
[13] Goensch, J.; Hassler, M., Optimizing the conditional value-at-risk in revenue management, Rev. Managerial Sci., 8, 495-521 (2014)
[14] Goensch, J.; Hassler, M.; Schur, R., Optimizing conditional value-at-risk in dynamic pricing, OR Spectrum, 40, 3, 711-750 (2018) · Zbl 1405.90065
[15] Heyman, D., Sobel, M., 2004. Stochastic Models in Operations Research, Vol. II. Courier Corporation
[16] Jurczenko E., Maillet, B., 2006. Theoretical foundations of asset allocation and pricing models with higher-order moments. In: Adoock-Jurczenko-Maillet (Eds.), Multimoment Asset Allocation and Pricing Models, John Wiley and Sons, pp. 1-32.
[17] Kimball, M., Standard risk aversion, Econometrica, 61, 3, 589-611 (1993) · Zbl 0771.90017
[18] Koenig, M.; Meissner, J., Value-at-risk optimal policies for revenue management problems, Int. J. Prod. Econ., 166, 11-19 (2015)
[19] Koenig, M.; Meissner, J., Risk minimizing strategies for revenue management with target values, J. Oper. Res. Soc., 67, 3, 402-411 (2016)
[20] Levin, Y.; McGill, J.; Nediak, M., Risk in revenue management and dynamic pricing, Oper. Res., 56, 2, 326-343 (2008) · Zbl 1167.90341
[21] Li, M. Z.F.; Zhuang, W., Risk-sensitive dynamic pricing for a single perishable product, Oper. Res. Lett., 37, 5, 327-332 (2009) · Zbl 1180.91136
[22] Lim, A. E.B.; Shanthikumar, J. G., Relative entropy, exponential utility and robust dynamic pricing, Oper. Res., 55, 198-214 (2007) · Zbl 1167.91347
[23] Mandl, P., On the variance in controlled markov chains, Kybernetika, 7, 1, 1-12 (1971) · Zbl 0215.25902
[24] Markowitz, H., Portfolio selection, J. Finance, 7, 1, 77-91 (1952)
[25] Pflug, G.; Pichler, A., Time-inconsistent multistage stochastic programs: martingale bounds, Eur. J. Oper. Res., 249, 155-163 (2016) · Zbl 1346.90646
[26] Phillips, R. L., Pricing and Revenue Optimization (2005), Stanford University Press
[27] Rudloff, B.; Street, A.; Valladao, D. M., Time consistency and risk averse dynamic decision models: definition, interpretation and practical consequences, Eur. J. Oper. Res., 234, 3, 743-750 (2014) · Zbl 1304.90113
[28] Ruszczyski, A., Risk-averse dynamic programming for markov decision processes, Math. Program., 125, 2, 235-261 (2010) · Zbl 1207.49032
[29] Schlosser, R., A stochastic dynamic pricing and advertising model under risk aversion, J. Revenue Pricing Manage., 14, 6, 451-468 (2015)
[30] Schlosser, R., Stochastic dynamic multi-product pricing with dynamic advertising and adoption effects, J. Revenue Pricing Manage., 15, 2, 153-169 (2016)
[31] Schur, R.; Goensch, J.; Hassler, M., Time-consistent, risk-averse dynamic pricing, Eur. J. Oper. Res., 277, 2, 587-603 (2019) · Zbl 1431.91445
[32] Scott, R. C.; Horvath, P. A., On the direction of preference for moments of higher order than the variance, J. Finance, 35, 4, 915-919 (1980)
[33] Sethi, S. P.; Thompson, G. L., Optimal Control Theory: Applications to Management Science and Economics (2000), Kluwer Academic Publishers: Kluwer Academic Publishers Boston, MA · Zbl 0998.49002
[34] Shapiro, A., On a time consistency concept in risk averse multistage stochastic programming, Oper. Res. Lett., 37, 143-147 (2009) · Zbl 1167.90613
[35] Sobel, M. J., The variance of discounted Markov decision processes, J. Appl. Prob., 19, 4, 794-802 (1982) · Zbl 0503.90091
[36] Sobel, M. J., Mean-variance tradeoffs in an undiscounted MDP, Oper. Res., 42, 1, 175-183 (1994) · Zbl 0798.90130
[37] Strauss, A. K.; Klein, R.; Steinhardt, C., A review of choice-based revenue management: theory and methods, Eur. J. Oper. Res., 271, 2, 375-387 (2018) · Zbl 1403.91229
[38] Talluri, K. T.; van Ryzin, G., The Theory and Practice of Revenue Management (2004), Kluver Academic Publishers · Zbl 1083.90024
[39] Yeoman, I.; McMahon-Beattie, U., Revenue Management: A Practical Pricing Perspective (2011), Palgrave Macmillan
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.