Google Scholar

Active offline policy selection

K Konyushova, Y Chen, T Paine…�- Advances in�…, 2021 - proceedings.neurips.cc

… Figure 1: Left: Offline policy selection attempts to choose the best policy from a set of policies,
… In offline policy selection (OPS) the task is to select the policy with the highest value from a …

Save Cite Cited by 25 Related articles All 7 versions View as HTML

[PDF] mlr.press

Offline policy selection under uncertainty

M Yang, B Dai, O Nachum, G Tucker…�- International�…, 2022 - proceedings.mlr.press

… We formally consider offline policy selection as learning preferences over a set of policy …
evaluation (OPE) is a highly active area of research. While the original motivation for OPE …

Save Cite Cited by 41 Related articles All 7 versions View as HTML

[PDF] neurips.cc

Supported policy optimization for offline reinforcement learning

J Wu, H Wu, Z Qiu, J Wang…�- Advances in Neural�…, 2022 - proceedings.neurips.cc

… We select the classic BC [36] and state-of-the-art offline RL methods as baselines. For
methods based on dynamic programming, we compare to AWAC [33], Onestep RL [2], TD3+BC [7]…

Save Cite Cited by 55 Related articles All 9 versions View as HTML

[PDF] neurips.cc

Mopo: Model-based offline policy optimization

T Yu, G Thomas, L Yu, S Ermon…�- Advances in�…, 2020 - proceedings.neurips.cc

… policies without any costly or dangerous active exploration. However, it is also challenging,
due to the distributional shift between the offline … can be selected from a larger set of policies. …

Save Cite Cited by 817 Related articles All 11 versions View as HTML

An offline-to-online reinforcement learning approach based on multi-action evaluation with policy extension

X Cheng, X Huang, Z Huang, N Jiang�- Applied Intelligence, 2024 - Springer

… policy due to the overly conservative constraints, offline RL confronts challenges in active …
Then, we consider all possible actions and select them through the action-selection calculation …

Save Cite Related articles

[PDF] ualberta.ca

Towards Practical Offline Reinforcement Learning: Sample Efficient Policy Selection and Evaluation

V Liu - 2024 - era.library.ualberta.ca

… In the first part of the dissertation, we study the offline policy selection problem. Policy
selection is a … We expect active research topics will be (1) to identify suitable conditions on the …

Save Cite Related articles View as HTML

[PDF] ieee.org

A survey on offline reinforcement learning: Taxonomy, review, and open problems

RF Prudencio, MROA Maximo…�- IEEE Transactions on�…, 2023 - ieeexplore.ieee.org

… in offline RL finding a better policy than the one used to generate the data. Although selecting
the … ]) can work well in the offline RL setting since we can avoid active data collection and …

Save Cite Cited by 280 Related articles All 9 versions

[HTML] wiley.com Full View

A criterion for selecting the appropriate one from the trained models for model‐based offline policy evaluation

C Li, Y Wang, ZM Ma, Y Liu�- CAAI Transactions on Intelligence�…, 2024 - Wiley Online Library

… offline datasets, thereby disallowing the agent from actively … A challenge encountered
within Offline RL is offline policy … performance of policies relying exclusively on offline data [5, …

Save Cite Related articles

[PDF] mlr.press

Showing your offline reinforcement learning work: Online evaluation budget matters

V Kurenkov, S Kolesnikov�- International Conference on�…, 2022 - proceedings.mlr.press

… Training and evaluation of deep offline RL algorithms is still in active development, and …
As the EOP incorporates offline policy selection, we can also use it to compare how well OPS …

Save Cite Cited by 24 Related articles All 4 versions View as HTML

Mild policy evaluation for offline actor–critic

L Huang, B Dong, J Lu, W Zhang�- IEEE Transactions on�…, 2023 - ieeexplore.ieee.org

… offline AC … a mild policy evaluation (MPE) by constraining the difference between the $Q$
values of actions supported by the target policy and those of actions contained within the offline …

Save Cite Cited by 4 Related articles All 3 versions

Cite

Advanced search

Saved to My library