×

Approximate dynamic programming for stochastic \(N\)-stage optimization with application to optimal consumption under uncertainty. (English) Zbl 1321.90093

Summary: Stochastic optimization problems with an objective function that is additive over a finite number of stages are addressed. Although Dynamic Programming allows one to formally solve such problems, closed-form solutions can be derived only in particular cases. The search for suboptimal solutions via two approaches is addressed: approximation of the value functions and approximation of the optimal decision policies. The approximations take on the form of linear combinations of basis functions containing adjustable parameters to be optimized together with the coefficients of the combinations. Two kinds of basis functions are considered: Gaussians with varying centers and widths and sigmoids with varying weights and biases. The accuracies of such suboptimal solutions are investigated via estimates of the error propagation through the stages. Upper bounds are derived on the differences between the optimal value of the objective functional and its suboptimal values corresponding to the use at each stage of approximate value functions and approximate policies. Conditions under which the number of basis functions required for a desired approximation accuracy does not grow “too fast” with respect to the dimensions of the state and random vectors are provided. As an example of application, a multidimensional problem of optimal consumption under uncertainty is investigated, where consumers aim at maximizing a social utility function. Numerical simulations are provided, emphasizing computational pros and cons of the two approaches (i.e., value-function approximation and optimal-policy approximation) using the above-mentioned two kinds of basis functions. To investigate the dependencies of the performances on dimensionality, the numerical analysis is performed for various numbers of consumers. In the simulations, discretization techniques exploiting low-discrepancy sequences are used. Both theoretical and numerical results give insights into the possibility of coping with the curse of dimensionality in stochastic optimization problems whose decision strategies depend on large numbers of variables.

MSC:

90C15 Stochastic programming
90C39 Dynamic programming
91B42 Consumer behavior, demand theory

Software:

Approxrl
Full Text: DOI

References:

[1] Adda, J., Cooper, R.: Dynamic Economics: Quantitative Methods and Applications. MIT Press, Cambridge (2003)
[2] Alessandri, A., Gaggero, M., Zoppoli, R.: Feedback optimal control of distributed parameter systems by using finite-dimensional approximation schemes. IEEE Trans. Neural Netw. Learn. Syst. 23(6), 984-996 (2012) · doi:10.1109/TNNLS.2012.2192748
[3] Altman, E., Nain, P.: Optimal control of the M/G/1 queue with repeated vacations of the server. IEEE Trans. Autom. Control 38, 1766-1775 (1993) · Zbl 0810.90039 · doi:10.1109/9.250556
[4] Anderson, E.J., Nash, P.: Linear Programming in Infinite-Dimensional Spaces. Wiley, New York (1987) · Zbl 0632.90038
[5] Baird, L., Residual algorithms: reinforcement learning with function approximation, 30-37 (1995)
[6] Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957) · Zbl 0077.13605
[7] Bellman, R., Dreyfus, S.: Functional approximations and dynamic programming. Math. Tables Other Aids Comput. 13, 247-251 (1959) · Zbl 0095.34403 · doi:10.2307/2002797
[8] Bellman, R., Kalaba, R., Kotkin, B.: Polynomial approximation—a new computational technique in dynamic programming. Math. Comput. 17, 155-161 (1963) · Zbl 0123.37303
[9] Benveniste, L.M., Scheinkman, J.A.: On the differentiability of the value function in dynamic models of economics. Econometrica 47, 727-732 (1979) · Zbl 0435.90031 · doi:10.2307/1910417
[10] Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 1. Athena Scientific, Belmont (2005) · Zbl 1125.90056
[11] Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 2. Athena Scientific, Belmont (2007)
[12] Bertsekas, D.P., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996) · Zbl 0924.68163
[13] Bertsekas, D. P.; Borkar, V. S.; Nedic, A.; Si, J. (ed.); Barto, A. G. (ed.); Powell, W. B. (ed.); Wunsch, D. (ed.), Improved temporal difference methods with linear function approximation, 233-257 (2004), New York
[14] Bhattacharya, R., Majumdar, M.: Random Dynamical Systems: Theory and Applications. Cambridge University Press, Cambridge (2007) · Zbl 1114.37027 · doi:10.1017/CBO9780511618628
[15] Blume, L., Easley, D., O’Hara, M.: Characterization of optimal plans for stochastic dynamic programs. J. Econ. Theory 28, 221-234 (1982) · Zbl 0509.90021 · doi:10.1016/0022-0531(82)90059-X
[16] Buhmann, M.D.: Radial Basis Functions. Cambridge University Press, Cambridge (2003) · Zbl 1038.41001 · doi:10.1017/CBO9780511543241
[17] Busoniu, L., Babuska, R., Schutter, B.D., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, Boca Raton (2010) · Zbl 1191.49027 · doi:10.1201/9781439821091
[18] Cervellera, C., Muselli, M.: Efficient sampling in approximate dynamic programming algorithms. Comput. Optim. Appl. 38, 417-443 (2007) · Zbl 1171.90538 · doi:10.1007/s10589-007-9054-8
[19] Cervellera, C., Gaggero, M., Macciò, D.: Efficient kernel models for learning and approximate minimization problems. Neurocomputing 97, 74-85 (2012) · doi:10.1016/j.neucom.2012.04.023
[20] Cervellera, C.; Gaggero, M.; Macciò, D.; Marcialis, R., Quasi-random sampling for approximate dynamic programming, 2567-2574 (2013)
[21] Cervellera, C., Gaggero, M., Macciò, D.: Low-discrepancy sampling for approximate dynamic programming with local approximators. Comput. Oper. Res. 43, 108-115 (2014) · Zbl 1348.90616 · doi:10.1016/j.cor.2013.09.006
[22] Chen, V.C.P., Ruppert, D., Shoemaker, C.A.: Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming. Oper. Res. 47, 38-53 (1999) · Zbl 0979.90094 · doi:10.1287/opre.47.1.38
[23] Cruz-Suárez, H., Montes-de-Oca, R.: Discounted Markov control processes induced by deterministic systems. Kybernetika 42, 647-664 (2006) · Zbl 1249.90312
[24] Cruz-Suárez, H., Montes-de-Oca, R.: An envelope theorem and some applications to discounted Markov decision processes. Math. Methods Oper. Res. 67, 299-321 (2008) · Zbl 1149.90171 · doi:10.1007/s00186-007-0155-z
[25] Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2, 303-314 (1989) · Zbl 0679.94019 · doi:10.1007/BF02551274
[26] Ekeland, I., Turnbull, T.: Infinite-Dimensional Optimization and Convexity. University of Chicago Press, Chicago (1983) · Zbl 0565.49003
[27] Fang, K.T., Wang, Y.: Number-Theoretic Methods in Statistics. Chapman & Hall, London (1994) · Zbl 0925.65263 · doi:10.1007/978-1-4899-3095-8
[28] Farahmand, A.; Ghavamzadeh, M.; Szepesvári, C.; Mannor, S., Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems, 725-730 (2009)
[29] Foufoula-Georgiou, E., Kitanidis, P.K.: Gradient dynamic programming for stochastic optimal control of multidimensional water resources systems. Water Resour. Res. 24, 1345-1359 (1988) · doi:10.1029/WR024i008p01345
[30] Gaggero, M., Gnecco, G., Sanguineti, M.: Dynamic programming and value-function approximation in sequential decision problems: error analysis and numerical results. J. Optim. Theory Appl. 156, 380-416 (2013) · Zbl 1262.90186 · doi:10.1007/s10957-012-0118-2
[31] Gaggero, M.; Gnecco, G.; Sanguineti, M.; El Ouardighi, F. (ed.); Kogan, K. (ed.), Suboptimal policies for stochastic N-stage optimization problems: accuracy analysis and a case study from optimal consumption, No. 198, 27-50 (2014), Berlin · Zbl 1291.90286 · doi:10.1007/978-3-319-00669-7_3
[32] Gelfand, I.M., Fomin, S.V.: Calculus of Variations. Prentice Hall, Englewood Cliffs (1963) · Zbl 0127.05402
[33] Giulini, S., Sanguineti, M.: Approximation schemes for functional optimization problems. J. Optim. Theory Appl. 140, 33-54 (2009) · Zbl 1176.90658 · doi:10.1007/s10957-008-9471-6
[34] Gnecco, G., Sanguineti, M.: Approximation error bounds via Rademacher’s complexity. Appl. Math. Sci. 2, 153-176 (2008) · Zbl 1169.42320
[35] Gnecco, G., Sanguineti, M.: Suboptimal solutions to dynamic optimization problems via approximations of the policy functions. J. Optim. Theory Appl. 146, 764-794 (2010) · Zbl 1254.90285 · doi:10.1007/s10957-010-9680-7
[36] Gnecco, G., Sanguineti, M., Gaggero, M.: Suboptimal solutions to team optimization problems with stochastic information structure. SIAM J. Optim. 22, 212-243 (2012) · Zbl 1259.90047 · doi:10.1137/100803481
[37] Gribonval, R., Vandergheynst, P.: On the exponential convergence of matching pursuits in quasi-incoherent dictionaries. IEEE Trans. Inf. Theory 52, 255-261 (2006) · Zbl 1309.94038 · doi:10.1109/TIT.2005.860474
[38] Grisvard, P.: Elliptic Problems in Nonsmooth Domains. Pitman, Boston (1985) · Zbl 0695.35060
[39] Hammersley, J.M., Handscomb, D.C.: Monte Carlo Methods. Methuen, London (1964) · Zbl 0121.35503 · doi:10.1007/978-94-009-5819-7
[40] Hernandez-Lerma, O., Lasserre, J.: Approximation schemes for infinite linear programs. SIAM J. Optim. 8, 973-988 (1998) · Zbl 0912.90219 · doi:10.1137/S1052623497315768
[41] Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I. Springer, Berlin (1996)
[42] Ito, S., Wu, S.Y., Shiu, T.J., Teo, K.L.: A numerical approach to infinite-dimensional linear programming in L1 spaces. J. Ind. Manag. Optim. 6, 15-18 (2010) · Zbl 1206.90081
[43] Johnson, S., Stedinger, J., Shoemaker, C., Li, Y., Tejada-Guibert, J.: Numerical solution of continuous-state dynamic programs using linear and spline interpolation. Oper. Res. 41, 484-500 (1993) · Zbl 0777.90074 · doi:10.1287/opre.41.3.484
[44] Judd, K.: Numerical Methods in Economics. MIT Press, Cambridge (1998) · Zbl 0924.65001
[45] Kainen, P., Kůrková, V., Sanguineti, M.: Minimization of error functionals over variable-basis functions. SIAM J. Optim. 14, 732-742 (2003) · Zbl 1061.49020 · doi:10.1137/S1052623402401233
[46] Kainen, P.C., Kůrková, V., Sanguineti, M.: Dependence of computational models on input dimension: tractability of approximation and optimization tasks. IEEE Trans. Inf. Theory 58, 1203-1214 (2012) · Zbl 1365.68373 · doi:10.1109/TIT.2011.2169531
[47] Karp, L., Lee, I.H.: Learning-by-doing and the choice of technology: the role of patience. J. Econ. Theory 100, 73-92 (2001) · Zbl 0996.91081 · doi:10.1006/jeth.2000.2727
[48] Kuhn, D.: Generalized Bounds for Convex Multistage Stochastic Programs. Springer, Berlin (2005) · Zbl 1103.90069
[49] Kůrková, V., Sanguineti, M.: Bounds on rates of variable-basis and neural-network approximation. IEEE Trans. Inf. Theory 47, 2659-2665 (2001) · Zbl 1008.41012 · doi:10.1109/18.945285
[50] Kůrková, V., Sanguineti, M.: Comparison of worst-case errors in linear and neural network approximation. IEEE Trans. Inf. Theory 48, 264-275 (2002) · Zbl 1059.62589 · doi:10.1109/18.971754
[51] Kůrková, V., Sanguineti, M.: Error estimates for approximate optimization by the extended Ritz method. SIAM J. Optim. 18, 461-487 (2005) · Zbl 1074.49008 · doi:10.1137/S1052623403426507
[52] Kůrková, V., Sanguineti, M.: Approximate minimization of the regularized expected error over kernel models. Math. Oper. Res. 33, 747-756 (2008) · Zbl 1216.68135 · doi:10.1287/moor.1080.0317
[53] Kůrková, V., Sanguineti, M.: Geometric upper bounds on rates of variable-basis approximation. IEEE Trans. Inf. Theory 54, 5681-5688 (2008) · Zbl 1319.68177 · doi:10.1109/TIT.2008.2006383
[54] Lendaris, G. G.; Neidhoefer, J. C.; Si, J. (ed.); Barto, A. G. (ed.); Powell, W. B. (ed.); Wunsch, D. (ed.), Guidance in the choice of adaptive critics for control, 97-124 (2004), New York
[55] Loomis, L.H.: An Introduction to Abstract Harmonic Analysis. Van Nostrand, Princeton (1953) · Zbl 0052.11701
[56] Mhaskar, H.N.: Neural networks for optimal approximation of smooth and analytic functions. Neural Comput. 8, 164-177 (1996) · doi:10.1162/neco.1996.8.1.164
[57] Montrucchio, L.: Lipschitz continuous policy functions for strongly concave optimization problems. J. Math. Econ. 16, 259-273 (1987) · Zbl 0636.90077 · doi:10.1016/0304-4068(87)90012-7
[58] Montrucchio, L.: Thompson metric, contraction property and differentiability of policy functions. J. Econ. Behav. Organ. 33, 449-466 (1998) · doi:10.1016/S0167-2681(97)00069-3
[59] Munos, R., Szepesvári, C.: Finite-time bounds for fitted value iteration. J. Mach. Learn. Res. 1, 815-857 (2008) · Zbl 1225.68203
[60] Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM, Philadelphia (1992) · Zbl 0761.65002 · doi:10.1137/1.9781611970081
[61] Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (2006) · Zbl 1104.65059
[62] Philbrick, C.R., Jr., Kitanidis, P.K.: Improved dynamic programming methods for optimal control of lumped-parameter stochastic systems. Oper. Res. 49, 398-412 (2001) · Zbl 1163.90783 · doi:10.1287/opre.49.3.398.11219
[63] Pinkus, A.: Approximation theory of the MLP model in neural networks. Acta Numer. 8, 143-195 (1999) · Zbl 0959.68109 · doi:10.1017/S0962492900002919
[64] Poggio, T., Smale, S.: The mathematics of learning: dealing with data. Not. Am. Math. Soc. 50, 537-544 (2003) · Zbl 1083.68100
[65] Powell, W.B.: Approximate Dynamic Programming. Wiley-Interscience, Hoboken (2007) · Zbl 1156.90021 · doi:10.1002/9780470182963
[66] Puterman, M.L., Shin, M.C.: Modified policy iteration algorithms for discounted Markov decision processes. Manag. Sci. 41, 1127-1137 (1978) · Zbl 0391.90093 · doi:10.1287/mnsc.24.11.1127
[67] Rapaport, A., Sraidi, S., Terreaux, J.: Optimality of greedy and sustainable policies in the management of renewable resources. Optim. Control Appl. Methods 24, 23-44 (2003) · Zbl 1073.49018 · doi:10.1002/oca.718
[68] Rincon-Zapatero, J.P., Santos, M.S.: An envelope theorem and some applications to discounted Markov decision processes. J. Econ. Theory 144, 19481764 (2009) · Zbl 1195.90092 · doi:10.1016/j.jet.2009.02.006
[69] Sahinidis, N.V.: Optimization under uncertainty: state-of-the-art and opportunities. Comput. Chem. Eng. 28, 971-983 (2004) · doi:10.1016/j.compchemeng.2003.09.017
[70] Santos, M.S.: Smoothness of policy function in discrete time economic models. Econometrica 59, 1365-1382 (1991) · Zbl 0781.90092 · doi:10.2307/2938371
[71] Santos, M.S.: On high-order differentiability of the policy function. Econ. Theory 3, 565-570 (1993) · Zbl 0820.90024 · doi:10.1007/BF01209702
[72] Santos, M.S., Vila, J.L.: Smoothness of the policy function in continuous-time economic models: the one-dimensional case. J. Econ. Dyn. Control 15, 741-753 (1988) · Zbl 0755.90014 · doi:10.1016/0165-1889(91)90042-Y
[73] Schweitzer, P.J., Seidmann, A.: Generalized polynomial approximations in Markovian decision processes. J. Math. Anal. Appl. 110, 568-582 (1985) · Zbl 0578.90091 · doi:10.1016/0022-247X(85)90317-8
[74] Secomandi, N.: Optimal commodity trading with a capacitated storage asset. Manag. Sci. 56, 449-467 (2010) · Zbl 1232.90089 · doi:10.1287/mnsc.1090.1049
[75] Semmler, W., Sieveking, M.: Critical debt and debt dynamics. J. Econ. Dyn. Control 24, 1121-1144 (2000) · Zbl 1136.91541 · doi:10.1016/S0165-1889(99)00039-1
[76] Si, J., Barto, A.G., Powell, W.B., Wunsch, D. (eds.): Handbook of Learning and Approximate Dynamic Programming. IEEE Press, New York (2004)
[77] Singer, I.: Best Approximation in Normed Linear Spaces by Elements of Linear Subspaces. Springer, Berlin (1970) · Zbl 0197.38601 · doi:10.1007/978-3-662-41583-2
[78] Sobol’, I.M.: The distribution of points in a cube and the approximate evaluation of integrals. Ž. Vyčisl. Mat. Mat. Fiz. 7, 784-802 (1967)
[79] Stein, E.M.: Singular Integrals and Differentiability Properties of Functions. Princeton University Press, Princeton (1970) · Zbl 0207.13501
[80] Stokey, N.L., Lucas, R.E., Prescott, E.: Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge (1989) · Zbl 0774.90018
[81] ten Hagen, S., Kröse, B.: Neural Q-learning. Neural Comput. Appl. 12, 81-88 (2003) · doi:10.1007/s00521-003-0369-9
[82] Tesauro, G.: Practical issues in temporal difference learning. Mach. Learn. 8, 257-277 (1992) · Zbl 0772.68075
[83] Tsitsiklis, J.N.: Perspectives on stochastic optimization over time. INFORMS J. Comput. 22, 18-19 (2010) · Zbl 1243.90165 · doi:10.1287/ijoc.1090.0350
[84] Tsitsiklis, J.N., Roy, B.V.: Feature-based methods for large scale dynamic programming. Mach. Learn. 22, 59-94 (1996) · Zbl 0843.68092
[85] Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998) · Zbl 0935.62007
[86] White, D.J.: Markov Decision Processes. Wiley, New York (1993) · Zbl 0810.90135
[87] Zoppoli, R., Parisini, T., Sanguineti, M.: Approximating networks and extended Ritz method for the solution of functional optimization problems. J. Optim. Theory Appl. 112, 403-439 (2002) · Zbl 0992.65072 · doi:10.1023/A:1013662124879
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.