Active offline policy selection

K Konyushova, Y Chen, T Paine…�- Advances in�…, 2021 - proceedings.neurips.cc
… Figure 1: Left: Offline policy selection attempts to choose the best policy from a set of policies,
… In offline policy selection (OPS) the task is to select the policy with the highest value from a …

Offline policy selection under uncertainty

M Yang, B Dai, O Nachum, G Tucker…�- International�…, 2022 - proceedings.mlr.press
… We formally consider offline policy selection as learning preferences over a set of policy
evaluation (OPE) is a highly active area of research. While the original motivation for OPE …

Supported policy optimization for offline reinforcement learning

J Wu, H Wu, Z Qiu, J Wang…�- Advances in Neural�…, 2022 - proceedings.neurips.cc
… We select the classic BC [36] and state-of-the-art offline RL methods as baselines. For
methods based on dynamic programming, we compare to AWAC [33], Onestep RL [2], TD3+BC [7]…

Mopo: Model-based offline policy optimization

T Yu, G Thomas, L Yu, S Ermon…�- Advances in�…, 2020 - proceedings.neurips.cc
policies without any costly or dangerous active exploration. However, it is also challenging,
due to the distributional shift between the offline … can be selected from a larger set of policies. …

An offline-to-online reinforcement learning approach based on multi-action evaluation with policy extension

X Cheng, X Huang, Z Huang, N Jiang�- Applied Intelligence, 2024 - Springer
policy due to the overly conservative constraints, offline RL confronts challenges in active
Then, we consider all possible actions and select them through the action-selection calculation …

Towards Practical Offline Reinforcement Learning: Sample Efficient Policy Selection and Evaluation

V Liu - 2024 - era.library.ualberta.ca
… In the first part of the dissertation, we study the offline policy selection problem. Policy
selection is a … We expect active research topics will be (1) to identify suitable conditions on the …

A survey on offline reinforcement learning: Taxonomy, review, and open problems

RF Prudencio, MROA Maximo…�- IEEE Transactions on�…, 2023 - ieeexplore.ieee.org
… in offline RL finding a better policy than the one used to generate the data. Although selecting
the … ]) can work well in the offline RL setting since we can avoid active data collection and …

A criterion for selecting the appropriate one from the trained models for model‐based offline policy evaluation

C Li, Y Wang, ZM Ma, Y Liu�- CAAI Transactions on Intelligence�…, 2024 - Wiley Online Library
offline datasets, thereby disallowing the agent from actively … A challenge encountered
within Offline RL is offline policy … performance of policies relying exclusively on offline data [5, …

Showing your offline reinforcement learning work: Online evaluation budget matters

V Kurenkov, S Kolesnikov�- International Conference on�…, 2022 - proceedings.mlr.press
… Training and evaluation of deep offline RL algorithms is still in active development, and …
As the EOP incorporates offline policy selection, we can also use it to compare how well OPS …

Mild policy evaluation for offline actor–critic

L Huang, B Dong, J Lu, W Zhang�- IEEE Transactions on�…, 2023 - ieeexplore.ieee.org
offline AC … a mild policy evaluation (MPE) by constraining the difference between the $Q$
values of actions supported by the target policy and those of actions contained within the offline