Document Search Results

from until

Sorting

Found 22 Documents (Results 1–22)

Newest Citations Relevance

Chandak, Siddharth; Borkar, Vivek S.; Dolhare, Harsh

A concentration bound for \(\operatorname{LSPE}( \lambda )\). (English) Zbl 1505.93252

Syst. Control Lett. 171, Article ID 105418, 9 p. (2023).

MSC: 93E03

Cite Review PDF

Full Text: DOI arXiv

Kallus, Nathan; Uehara, Masatoshi

Efficiently breaking the curse of horizon in off-policy evaluation with double reinforcement learning. (English) Zbl 1510.90285

Oper. Res. 70, No. 6, 3282-3302 (2022).

MSC: 90C40 90C90

Cite Review PDF

Full Text: DOI arXiv

Forootani, Ali; Iervolino, Raffaele; Tipaldi, Massimo; Dey, Subhrakanti

Transmission scheduling for multi-process multi-sensor remote estimation via approximate dynamic programming. (English) Zbl 1485.93629

Automatica 136, Article ID 110061, 14 p. (2022).

Reviewer: Svetlana A. Kravchenko (Minsk)

MSC: 93E20 93E11 90C40 93C05 49L20

Cite Review PDF

Full Text: DOI

Forootani, Ali; Liuzza, Davide; Tipaldi, Massimo; Glielmo, Luigi

Allocating resources via price management systems: a dynamic programming-based approach. (English) Zbl 1480.91115

Int. J. Control 94, No. 8, 2123-2143 (2021).

MSC: 91B32 91B24 90C39 90C40

Cite Review PDF

Full Text: DOI

Doan, Thinh T.; Maguluri, Siva Theja; Romberg, Justin

Finite-time performance of distributed temporal-difference learning with linear function approximation. (English) Zbl 1483.68294

SIAM J. Math. Data Sci. 3, No. 1, 298-320 (2021).

MSC: 68T05 68T42 68W15 68W40 90C40

Cite Review PDF

Full Text: DOI arXiv

Kim, Michael Jong

Variance regularization in sequential Bayesian optimization. (English) Zbl 1459.90219

Math. Oper. Res. 45, No. 3, 966-992 (2020).

Reviewer: Giorgio Gnecco (Lucca)

MSC: 90C39 62C10

Cite Review PDF

Full Text: DOI

Joseph, Ajin George; Bhatnagar, Shalabh

An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method. (English) Zbl 1473.68152

Mach. Learn. 107, No. 8-10, 1385-1429 (2018).

MSC: 68T05 62L20 68W27

Cite Review PDF

Full Text: DOI arXiv

Bertsekas, Dimitri P.

Proximal algorithms and temporal difference methods for solving fixed point problems. (English) Zbl 1471.90159

Comput. Optim. Appl. 70, No. 3, 709-736 (2018).

MSC: 90C39 90C25

Cite Review PDF

Full Text: DOI Link

Cui, Yunduan; Matsubara, Takamitsu; Sugimoto, Kenji

Kernel dynamic policy programming: applicable reinforcement learning to robot systems with high dimensional states. (English) Zbl 1429.68212

Neural Netw. 94, 13-23 (2017).

MSC: 68T05 68T40

Cite Review PDF

Full Text: DOI

Cheng, Kang; Zhang, Kanjian; Fei, Shumin; Wei, Haikun

Potential-based least-squares policy iteration for a parameterized feedback control system. (English) Zbl 1342.49047

J. Optim. Theory Appl. 169, No. 2, 692-704 (2016).

MSC: 49M30 49K45 49N35 93E20 93B52 90C40 93C55

Cite Review PDF

Full Text: DOI

Xu, Xin; Zuo, Lei; Huang, Zhenhua

Reinforcement learning algorithms with function approximation: recent advances and applications. (English) Zbl 1328.68176

Inf. Sci. 261, 1-31 (2014).

MSC: 68T05 60J20

Cite Review PDF

Full Text: DOI

Cheng, Kang; Fei, Shumin; Zhang, Kanjian; Liu, Xiaomei; Wei, Haikun

Temporal difference-based policy iteration for optimal control of stochastic systems. (English) Zbl 1306.93074

J. Optim. Theory Appl. 163, No. 1, 165-180 (2014).

Reviewer: Andrzej Świerniak (Gliwice)

MSC: 93E20 49M30 49J55 49L20 93E03 93C55 90C39

Cite Review PDF

Full Text: DOI

Wang, Mengdi; Bertsekas, Dimitri P.

On the convergence of simulation-based iterative methods for solving singular linear systems. (English) Zbl 1295.65037

Stoch. Syst. 3, No. 1, 38-95 (2013).

MSC: 65F10 65F20 65C05

Cite Review PDF

Full Text: DOI

Fonteneau, Raphael; Murphy, Susan A.; Wehenkel, Louis; Ernst, Damien

Batch mode reinforcement learning based on the synthesis of artificial trajectories. (English) Zbl 1276.68134

Ann. Oper. Res. 208, 383-416 (2013).

MSC: 68T05 93E35

Cite Review PDF

Full Text: DOI Link

Bertsekas, Dimitri P.

Approximate policy iteration: a survey and some new methods. (English) Zbl 1249.90179

J. Control Theory Appl. 9, No. 3, 310-335 (2011).

MSC: 90C15 90C39

Cite Review PDF

Full Text: DOI Link

Wawrzyński, Paweł

Real-time reinforcement learning by sequential actor-critics and experience replay. (English) Zbl 1396.68107

Neural Netw. 22, No. 10, 1484-1497 (2009).

MSC: 68T05 93C40

Cite Review PDF

Full Text: DOI

Bertsekas, Dimitri P.; Yu, Huizhen

Projected equation methods for approximate solution of large linear systems. (English) Zbl 1165.65010

J. Comput. Appl. Math. 227, No. 1, 27-50 (2009).

Reviewer: Jiri Náprstek (Praha)

MSC: 65F10 65F30 65C05 65C40 60J20 49L20 60J22 65F20

Cite Review PDF

Full Text: DOI

Drugowitsch, Jan; Barry, Alwyn M.

A formal framework and extensions for function approximation in learning classifier systems. (English) Zbl 1470.68099

Mach. Learn. 70, No. 1, 45-88 (2008).

MSC: 68T05 62H30

Cite Review PDF

Full Text: DOI

Barman, Kishor; Borkar, Vivek S.

A note on linear function approximation using random projections. (English) Zbl 1153.93037

Syst. Control Lett. 57, No. 9, 784-786 (2008).

MSC: 93E25 15A18 51K05

Cite Review PDF

Full Text: DOI

Sarimveis, Haralambos; Patrinos, Panagiotis; Tarantilis, Chris D.; Kiranoudis, Chris T.

Dynamic modeling and control of supply chain systems: A review. (English) Zbl 1146.90353

Comput. Oper. Res. 35, No. 11, 3530-3561 (2008).

MSC: 90B10

Cite Review PDF

Full Text: DOI

Tadić, Vladislav B.

Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes. (English) Zbl 1470.68185

Mach. Learn. 63, No. 2, 107-133 (2006).

MSC: 68T05 60J20 60K25 62M10 68W40

Cite Review PDF

Full Text: DOI

Tadić, Vladislav B.

Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes. (English) Zbl 1102.68753

Mach. Learn. 63, No. 2, 107-133 (2006).

MSC: 68W40

Cite Review PDF

Full Text: DOI

Filter Results by …

all top 5

Author

all top 5

Serial

all top 5

Year of Publication

2023 (1)
2022 (2)
2021 (2)
2020 (1)
2018 (2)
2017 (1)
2016 (1)
2014 (2)
2013 (2)
2011 (1)
2009 (2)
2008 (3)
2006 (2)

all top 3

Main Field

90-XX (10)
68-XX (9)
93-XX (7)
49-XX (4)
62-XX (4)
60-XX (3)
65-XX (2)
15-XX (1)
51-XX (1)
91-XX (1)

all top 3

Found 22 Documents (Results 1–22)

A concentration bound for \(\operatorname{LSPE}( \lambda )\). (English) Zbl 1505.93252

Efficiently breaking the curse of horizon in off-policy evaluation with double reinforcement learning. (English) Zbl 1510.90285

Transmission scheduling for multi-process multi-sensor remote estimation via approximate dynamic programming. (English) Zbl 1485.93629

Allocating resources via price management systems: a dynamic programming-based approach. (English) Zbl 1480.91115

Finite-time performance of distributed temporal-difference learning with linear function approximation. (English) Zbl 1483.68294

Variance regularization in sequential Bayesian optimization. (English) Zbl 1459.90219

An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method. (English) Zbl 1473.68152

Proximal algorithms and temporal difference methods for solving fixed point problems. (English) Zbl 1471.90159

Kernel dynamic policy programming: applicable reinforcement learning to robot systems with high dimensional states. (English) Zbl 1429.68212

Potential-based least-squares policy iteration for a parameterized feedback control system. (English) Zbl 1342.49047

Reinforcement learning algorithms with function approximation: recent advances and applications. (English) Zbl 1328.68176

Temporal difference-based policy iteration for optimal control of stochastic systems. (English) Zbl 1306.93074

On the convergence of simulation-based iterative methods for solving singular linear systems. (English) Zbl 1295.65037

Batch mode reinforcement learning based on the synthesis of artificial trajectories. (English) Zbl 1276.68134

Approximate policy iteration: a survey and some new methods. (English) Zbl 1249.90179

Real-time reinforcement learning by sequential actor-critics and experience replay. (English) Zbl 1396.68107

Projected equation methods for approximate solution of large linear systems. (English) Zbl 1165.65010

A formal framework and extensions for function approximation in learning classifier systems. (English) Zbl 1470.68099

A note on linear function approximation using random projections. (English) Zbl 1153.93037

Dynamic modeling and control of supply chain systems: A review. (English) Zbl 1146.90353

Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes. (English) Zbl 1470.68185

Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes. (English) Zbl 1102.68753

Filter Results by …

Author

Serial

Year of Publication

Main Field

Software