×

Dynamic programming and value-function approximation in sequential decision problems: error analysis and numerical results. (English) Zbl 1262.90186

Summary: Value-function approximation is investigated for the solution via Dynamic Programming (DP) of continuous-state sequential \(N\)-stage decision problems, in which the reward to be maximized has an additive structure over a finite number of stages. Conditions that guarantee smoothness properties of the value function at each stage are derived. These properties are exploited to approximate such functions by means of certain nonlinear approximation schemes, which include splines of suitable order and Gaussian radial-basis networks with variable centers and widths. The accuracies of suboptimal solutions obtained by combining DP with these approximation tools are estimated. The results provide insights into the successful performances appeared in the literature about the use of value-function approximators in DP. The theoretical analysis is applied to a problem of optimal consumption, with simulation results illustrating the use of the proposed solution methodology. Numerical comparisons with classical linear approximators are presented.

MSC:

90C39 Dynamic programming
91B06 Decision theory
Full Text: DOI

References:

[1] Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957) · Zbl 0077.13605
[2] Bertsekas, D.P., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996) · Zbl 0924.68163
[3] Powell, W.B.: Approximate Dynamic Programming–Solving the Curses of Dimensionality. Wiley, Hoboken (2007) · Zbl 1156.90021
[4] Si, J., Barto, A.G., Powell, W.B., Wunsch, D. (eds.): Handbook of Learning and Approximate Dynamic Programming. IEEE Press, New York (2004)
[5] Zoppoli, R., Parisini, T., Sanguineti, M., Gnecco, G.: Neural Approximations for Optimal Control and Decision. Springer, London (2012, in preparation)
[6] Haykin, S.: Neural Networks: a Comprehensive Foundation. Prentice Hall, New York (1998) · Zbl 0828.68103
[7] Bertsekas, D.P.: Dynamic Programming and Optimal Control vol. 1. Athena Scientific, Belmont (2005)
[8] Bellman, R., Dreyfus, S.: Functional approximations and dynamic programming. Math. Tables Other Aids Comput. 13, 247–251 (1959) · Zbl 0095.34403 · doi:10.2307/2002797
[9] Bellman, R., Kalaba, R., Kotkin, B.: Polynomial approximation–a new computational technique in dynamic programming. Math. Comput. 17, 155–161 (1963) · Zbl 0123.37303
[10] Foufoula-Georgiou, E., Kitanidis, P.K.: Gradient dynamic programming for stochastic optimal control of multidimensional water resources systems. Water Resour. Res. 24, 1345–1359 (1988) · doi:10.1029/WR024i008p01345
[11] Johnson, S., Stedinger, J., Shoemaker, C., Li, Y., Tejada-Guibert, J.: Numerical solution of continuous-state dynamic programs using linear and spline interpolation. Oper. Res. 41, 484–500 (1993) · Zbl 0777.90074 · doi:10.1287/opre.41.3.484
[12] Chen, V.C.P., Ruppert, D., Shoemaker, C.A.: Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming. Oper. Res. 47, 38–53 (1999) · Zbl 0979.90094 · doi:10.1287/opre.47.1.38
[13] Cervellera, C., Muselli, M.: Efficient sampling in approximate dynamic programming algorithms. Comput. Optim. Appl. 38, 417–443 (2007) · Zbl 1171.90538 · doi:10.1007/s10589-007-9054-8
[14] Philbrick, C.R. Jr., Kitanidis, P.K.: Improved dynamic programming methods for optimal control of lumped-parameter stochastic systems. Oper. Res. 49, 398–412 (2001) · Zbl 1163.90783 · doi:10.1287/opre.49.3.398.11219
[15] Judd, K.: Numerical Methods in Economics. MIT Press, Cambridge (1998) · Zbl 0924.65001
[16] Kurková, V., Sanguineti, M.: Comparison of worst-case errors in linear and neural network approximation. IEEE Trans. Inf. Theory 48, 264–275 (2002) · Zbl 1059.62589 · doi:10.1109/18.971754
[17] Tesauro, G.: Practical issues in temporal difference learning. Mach. Learn. 8, 257–277 (1992) · Zbl 0772.68075
[18] Gnecco, G., Sanguineti, M., Gaggero, M.: Suboptimal solutions to team optimization problems with stochastic information structure. SIAM J. Optim. 22, 212–243 (2012) · Zbl 1259.90047 · doi:10.1137/100803481
[19] Tsitsiklis, J.N., Roy, B.V.: Feature-based methods for large scale dynamic programming. Mach. Learn. 22, 59–94 (1996) · Zbl 1099.90586
[20] Zoppoli, R., Sanguineti, M., Parisini, T.: Approximating networks and extended Ritz method for the solution of functional optimization problems. J. Optim. Theory Appl. 112, 403–439 (2002) · Zbl 0992.65072 · doi:10.1023/A:1013662124879
[21] Alessandri, A., Gaggero, M., Zoppoli, R.: Feedback optimal control of distributed parameter systems by using finite-dimensional approximation schemes. IEEE Trans. Neural Netw. Learn. Syst. 23(6), 984–996 (2012) · doi:10.1109/TNNLS.2012.2192748
[22] Stokey, N.L., Lucas, R.E., Prescott, E.: Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge (1989) · Zbl 0774.90018
[23] Bertsekas, D.P.: Dynamic Programming and Optimal Control vol. 2. Athena Scientific, Belmont (2007)
[24] White, D.J.: Markov Decision Processes. Wiley, New York (1993)
[25] Puterman, M.L., Shin, M.C.: Modified policy iteration algorithms for discounted Markov decision processes. Manag. Sci. 41, 1127–1137 (1978) · Zbl 0391.90093 · doi:10.1287/mnsc.24.11.1127
[26] Altman, E., Nain, P.: Optimal control of the M/G/1 queue with repeated vacations of the server. IEEE Trans. Autom. Control 38, 1766–1775 (1993) · Zbl 0810.90039 · doi:10.1109/9.250556
[27] Lendaris, G.G., Neidhoefer, J.C.: Guidance in the choice of adaptive critics for control. In: Si, J., Barto, A.G., Powell, W.B., Wunsch, D. (eds.) Handbook of Learning and Approximate Dynamic Programming, pp. 97–124. IEEE Press, New York (2004)
[28] Karp, L., Lee, I.H.: Learning-by-doing and the choice of technology: the role of patience. J. Econ. Theory 100, 73–92 (2001) · Zbl 0996.91081 · doi:10.1006/jeth.2000.2727
[29] Rapaport, A., Sraidi, S., Terreaux, J.: Optimality of greedy and sustainable policies in the management of renewable resources. Optim. Control Appl. Methods 24, 23–44 (2003) · Zbl 1073.49018 · doi:10.1002/oca.718
[30] Semmler, W., Sieveking, M.: Critical debt and debt dynamics. J. Econ. Dyn. Control 24, 1121–1144 (2000) · Zbl 1136.91541 · doi:10.1016/S0165-1889(99)00039-1
[31] Nawijn, W.M.: Look-ahead policies for admission to a single-server loss system. Oper. Res. 38, 854–862 (1990) · Zbl 0723.90024 · doi:10.1287/opre.38.5.854
[32] Gnecco, G., Sanguineti, M.: Suboptimal solutions to dynamic optimization problems via approximations of the policy functions. J. Optim. Theory Appl. 146, 764–794 (2010) · Zbl 1254.90285 · doi:10.1007/s10957-010-9680-7
[33] Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I. Springer, Berlin (1993) · Zbl 0795.49001
[34] Stein, E.M.: Singular Integrals and Differentiability Properties of Functions. Princeton University Press, Princeton (1970) · Zbl 0207.13501
[35] Singer, I.: Best Approximation in Normed Linear Spaces by Elements of Linear Subspaces. Springer, Berlin (1970) · Zbl 0197.38601
[36] Kurková, V., Sanguineti, M.: Geometric upper bounds on rates of variable-basis approximation. IEEE Trans. Inf. Theory 54, 5681–5688 (2008) · Zbl 1319.68177 · doi:10.1109/TIT.2008.2006383
[37] Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39, 930–945 (1993) · Zbl 0818.68126 · doi:10.1109/18.256500
[38] Gnecco, G., Kurková, V., Sanguineti, M.: Some comparisons of complexity in dictionary-based and linear computational models. Neural Netw. 24, 171–182 (2011) · Zbl 1217.68184 · doi:10.1016/j.neunet.2010.10.002
[39] Wahba, G.: Spline Models for Observational Data. CBMS-NSF Regional Conf. Series in Applied Mathematics, vol. 59. SIAM, Philadelphia (1990) · Zbl 0813.62001
[40] Mhaskar, H.N.: Neural networks for optimal approximation of smooth and analytic functions. Neural Comput. 8, 164–177 (1996) · doi:10.1162/neco.1996.8.1.164
[41] Kainen, P.C., Kurková, V., Sanguineti, M.: Complexity of Gaussian radial-basis networks approximating smooth functions. J. Complex. 25, 63–74 (2009) · Zbl 1162.65006 · doi:10.1016/j.jco.2008.08.001
[42] Alessandri, A., Gnecco, G., Sanguineti, M.: Minimizing sequences for a family of functional optimal estimation problems. J. Optim. Theory Appl. 147, 243–262 (2010) · Zbl 1202.49034 · doi:10.1007/s10957-010-9720-3
[43] Adda, J., Cooper, R.: Dynamic Economics: Quantitative Methods and Applications. MIT Press, Cambridge (2003)
[44] Fang, K.T., Wang, Y.: Number-Theoretic Methods in Statistics. Chapman & Hall, London (1994) · Zbl 0925.65263
[45] Hammersley, J.M., Handscomb, D.C.: Monte Carlo Methods, Methuen, London (1964)
[46] Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM, Philadelphia (1992) · Zbl 0761.65002
[47] Sobol’, I.: The distribution of points in a cube and the approximate evaluation of integrals. Zh. Vychisl. Mat. Mat. Fiz. 7, 784–802 (1967)
[48] Loomis, L.H.: An Introduction to Abstract Harmonic Analysis. Van Nostrand, Princeton (1953) · Zbl 0052.11701
[49] Boldrin, M., Montrucchio, L.: On the indeterminacy of capital accumulation paths. J. Econ. Theory 40, 26–39 (1986) · Zbl 0662.90021 · doi:10.1016/0022-0531(86)90005-0
[50] Dawid, H., Kopel, M., Feichtinger, G.: Complex solutions of nonconcave dynamic optimization models. Econ. Theory 9, 427–439 (1997) · Zbl 0899.90024 · doi:10.1007/BF01213847
[51] Chambers, J., Cleveland, W.: Graphical Methods for Data Analysis. Wadsworth/Cole Publishing Company, Pacific Grove (1983) · Zbl 0532.65094
[52] Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (2006) · Zbl 1104.65059
[53] Zhang, F. (ed.): The Schur Complement and Its Applications. Springer, New York (2005) · Zbl 1075.15002
[54] Wilkinson, J.H.: The Algebraic Eigenvalue Problem. Oxford Science Publications, Oxford (2004) · Zbl 1098.68556
[55] Hornik, K., Stinchcombe, M., White, H., Auer, P.: Degree of approximation results for feedforward networks approximating unknown mappings and their derivatives. Neural Comput. 6, 1262–1275 (1994) · Zbl 0846.60088 · doi:10.1162/neco.1994.6.6.1262
[56] Adams, R.A., Fournier, J.J.F.: Sobolev Spaces. Academic Press, San Diego (2003)
[57] Rudin, W.: Functional Analysis. McGraw-Hill, New York (1973) · Zbl 0253.46001
[58] Gnecco, G., Sanguineti, M.: Approximation error bounds via Rademacher’s complexity. Appl. Math. Sci. 2, 153–176 (2008) · Zbl 1169.42320
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.