Chandak, Siddharth; Borkar, Vivek S.; Dolhare, Harsh A concentration bound for \(\operatorname{LSPE}( \lambda )\). (English) Zbl 1505.93252 Syst. Control Lett. 171, Article ID 105418, 9 p. (2023). MSC: 93E03 × Cite Format Result Cite Review PDF Full Text: DOI arXiv
Kallus, Nathan; Uehara, Masatoshi Efficiently breaking the curse of horizon in off-policy evaluation with double reinforcement learning. (English) Zbl 1510.90285 Oper. Res. 70, No. 6, 3282-3302 (2022). MSC: 90C40 90C90 × Cite Format Result Cite Review PDF Full Text: DOI arXiv
Forootani, Ali; Iervolino, Raffaele; Tipaldi, Massimo; Dey, Subhrakanti Transmission scheduling for multi-process multi-sensor remote estimation via approximate dynamic programming. (English) Zbl 1485.93629 Automatica 136, Article ID 110061, 14 p. (2022). Reviewer: Svetlana A. Kravchenko (Minsk) MSC: 93E20 93E11 90C40 93C05 49L20 × Cite Format Result Cite Review PDF Full Text: DOI
Forootani, Ali; Liuzza, Davide; Tipaldi, Massimo; Glielmo, Luigi Allocating resources via price management systems: a dynamic programming-based approach. (English) Zbl 1480.91115 Int. J. Control 94, No. 8, 2123-2143 (2021). MSC: 91B32 91B24 90C39 90C40 × Cite Format Result Cite Review PDF Full Text: DOI
Doan, Thinh T.; Maguluri, Siva Theja; Romberg, Justin Finite-time performance of distributed temporal-difference learning with linear function approximation. (English) Zbl 1483.68294 SIAM J. Math. Data Sci. 3, No. 1, 298-320 (2021). MSC: 68T05 68T42 68W15 68W40 90C40 × Cite Format Result Cite Review PDF Full Text: DOI arXiv
Kim, Michael Jong Variance regularization in sequential Bayesian optimization. (English) Zbl 1459.90219 Math. Oper. Res. 45, No. 3, 966-992 (2020). Reviewer: Giorgio Gnecco (Lucca) MSC: 90C39 62C10 × Cite Format Result Cite Review PDF Full Text: DOI
Joseph, Ajin George; Bhatnagar, Shalabh An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method. (English) Zbl 1473.68152 Mach. Learn. 107, No. 8-10, 1385-1429 (2018). MSC: 68T05 62L20 68W27 × Cite Format Result Cite Review PDF Full Text: DOI arXiv
Bertsekas, Dimitri P. Proximal algorithms and temporal difference methods for solving fixed point problems. (English) Zbl 1471.90159 Comput. Optim. Appl. 70, No. 3, 709-736 (2018). MSC: 90C39 90C25 × Cite Format Result Cite Review PDF Full Text: DOI Link
Cui, Yunduan; Matsubara, Takamitsu; Sugimoto, Kenji Kernel dynamic policy programming: applicable reinforcement learning to robot systems with high dimensional states. (English) Zbl 1429.68212 Neural Netw. 94, 13-23 (2017). MSC: 68T05 68T40 × Cite Format Result Cite Review PDF Full Text: DOI
Cheng, Kang; Zhang, Kanjian; Fei, Shumin; Wei, Haikun Potential-based least-squares policy iteration for a parameterized feedback control system. (English) Zbl 1342.49047 J. Optim. Theory Appl. 169, No. 2, 692-704 (2016). MSC: 49M30 49K45 49N35 93E20 93B52 90C40 93C55 × Cite Format Result Cite Review PDF Full Text: DOI
Xu, Xin; Zuo, Lei; Huang, Zhenhua Reinforcement learning algorithms with function approximation: recent advances and applications. (English) Zbl 1328.68176 Inf. Sci. 261, 1-31 (2014). MSC: 68T05 60J20 × Cite Format Result Cite Review PDF Full Text: DOI
Cheng, Kang; Fei, Shumin; Zhang, Kanjian; Liu, Xiaomei; Wei, Haikun Temporal difference-based policy iteration for optimal control of stochastic systems. (English) Zbl 1306.93074 J. Optim. Theory Appl. 163, No. 1, 165-180 (2014). Reviewer: Andrzej Świerniak (Gliwice) MSC: 93E20 49M30 49J55 49L20 93E03 93C55 90C39 × Cite Format Result Cite Review PDF Full Text: DOI
Wang, Mengdi; Bertsekas, Dimitri P. On the convergence of simulation-based iterative methods for solving singular linear systems. (English) Zbl 1295.65037 Stoch. Syst. 3, No. 1, 38-95 (2013). MSC: 65F10 65F20 65C05 × Cite Format Result Cite Review PDF Full Text: DOI
Fonteneau, Raphael; Murphy, Susan A.; Wehenkel, Louis; Ernst, Damien Batch mode reinforcement learning based on the synthesis of artificial trajectories. (English) Zbl 1276.68134 Ann. Oper. Res. 208, 383-416 (2013). MSC: 68T05 93E35 × Cite Format Result Cite Review PDF Full Text: DOI Link
Bertsekas, Dimitri P. Approximate policy iteration: a survey and some new methods. (English) Zbl 1249.90179 J. Control Theory Appl. 9, No. 3, 310-335 (2011). MSC: 90C15 90C39 × Cite Format Result Cite Review PDF Full Text: DOI Link
Wawrzyński, Paweł Real-time reinforcement learning by sequential actor-critics and experience replay. (English) Zbl 1396.68107 Neural Netw. 22, No. 10, 1484-1497 (2009). MSC: 68T05 93C40 × Cite Format Result Cite Review PDF Full Text: DOI
Bertsekas, Dimitri P.; Yu, Huizhen Projected equation methods for approximate solution of large linear systems. (English) Zbl 1165.65010 J. Comput. Appl. Math. 227, No. 1, 27-50 (2009). Reviewer: Jiri Náprstek (Praha) MSC: 65F10 65F30 65C05 65C40 60J20 49L20 60J22 65F20 × Cite Format Result Cite Review PDF Full Text: DOI
Drugowitsch, Jan; Barry, Alwyn M. A formal framework and extensions for function approximation in learning classifier systems. (English) Zbl 1470.68099 Mach. Learn. 70, No. 1, 45-88 (2008). MSC: 68T05 62H30 × Cite Format Result Cite Review PDF Full Text: DOI
Barman, Kishor; Borkar, Vivek S. A note on linear function approximation using random projections. (English) Zbl 1153.93037 Syst. Control Lett. 57, No. 9, 784-786 (2008). MSC: 93E25 15A18 51K05 × Cite Format Result Cite Review PDF Full Text: DOI
Sarimveis, Haralambos; Patrinos, Panagiotis; Tarantilis, Chris D.; Kiranoudis, Chris T. Dynamic modeling and control of supply chain systems: A review. (English) Zbl 1146.90353 Comput. Oper. Res. 35, No. 11, 3530-3561 (2008). MSC: 90B10 × Cite Format Result Cite Review PDF Full Text: DOI
Tadić, Vladislav B. Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes. (English) Zbl 1470.68185 Mach. Learn. 63, No. 2, 107-133 (2006). MSC: 68T05 60J20 60K25 62M10 68W40 × Cite Format Result Cite Review PDF Full Text: DOI
Tadić, Vladislav B. Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes. (English) Zbl 1102.68753 Mach. Learn. 63, No. 2, 107-133 (2006). MSC: 68W40 × Cite Format Result Cite Review PDF Full Text: DOI