Google Scholar

Phasic policy gradient

KW Cobbe, J Hilton, O Klimov…�- …�on Machine Learning, 2021 - proceedings.mlr.press

… Specifically, we set the policy gradient loss (… gradient is the policy gradient estimator,
subject to a constraint on the KL-divergence between the original policy and the updated policy. …

Save Cite Cited by 180 Related articles All 5 versions View as HTML

Resource Allocation in Time Slotted Channel Hopping (TSCH) networks based on phasic policy gradient reinforcement learning

L Bommisetty, TG Venkatesh�- Internet of Things, 2022 - Elsevier

… Motivated by these gaps in the literature to design a scheduling algorithm that achieves
global optimal solution, we propose a phasic policy gradient reinforcement learning based …

Save Cite Cited by 9 Related articles

[PDF] arxiv.org

Phasic Policy Gradient Based Resource Allocation for Industrial Internet of Things

L Bommisetty…�- 2022 IEEE 19th Annual�…, 2022 - ieeexplore.ieee.org

… In this paper, we propose a phasic policy gradient (PPG) based TSCH schedule learning …
-critic policy gradient method that learns the scheduling algorithm in two phases, namely policy …

Save Cite Cited by 2 Related articles All 5 versions

MAPPG: Multi-Agent Phasic Policy Gradient

Q Zhang, X Zhang, Y Liu, X Zhang…�- 2023 62nd IEEE�…, 2023 - ieeexplore.ieee.org

We propose a Multi-Agent Phasic Policy Gradient (MAPPG) algorithm, which can assist
agents to further alleviate the non-stationarity of the environment. Different from the existing …

Save Cite Related articles

[PDF] openreview.net

PPG reloaded: an empirical study on what matters in phasic policy gradient

K Wang, D Zhou, J Feng, S Mannor - 2023 - openreview.net

… In model-free reinforcement learning, recent methods based on a phasic policy gradient (PPG) …
However, through an extensive empirical study, we unveil that policy regularization and …

[HTML] plos.org

[HTML][HTML] Spike-based reinforcement learning in continuous state and action space: when policy gradient methods fail

E Vasilaki, N Fr�maux, R Urbanczik…�- PLoS computational�…, 2009 - journals.plos.org

… The family of learning rules includes an optimal rule derived from policy gradient methods
as … We show that in this architecture, a standard policy gradient rule fails to solve the Morris …

Save Cite Cited by 149 Related articles All 28 versions Cached

[PDF] mlr.press

Phasic self-imitative reduction for sparse-reward goal-conditioned reinforcement learning

Y Li, T Gao, J Yang, H Xu, Y Wu�- …�conference on machine�…, 2022 - proceedings.mlr.press

… The RL objective is policy gradient over rollout data (Eqn. 1), which requires (primarily) on-policy
samples (both success and failures) to make policy improvement. The SL objective (Eqn…

Save Cite Cited by 19 Related articles All 5 versions View as HTML

[PDF] mlr.press

Correcting discount-factor mismatch in on-policy policy gradient methods

F Che, G Vasan, AR Mahmood�- …�Conference on Machine�…, 2023 - proceedings.mlr.press

… other on-policy policy gradient estimators, including batch actor-critic (Konda & Tsitsiklis 1999)
and proximal policy … Our work establishes a more principled policy gradient estimator with …

Save Cite Cited by 6 Related articles All 6 versions View as HTML

[PDF] arxiv.org

Model-free policy learning with reward gradients

Q Lan, S Tosatto, H Farrahi, AR Mahmood�- arXiv preprint arXiv�…, 2021 - arxiv.org

… Reward Policy Gradient estimator, a novel approach that integrates reward gradients without …
, we develop a new policy gradient estimator—the Reward Policy Gradient (RPG) estimator…

Save Cite Cited by 10 Related articles All 5 versions View as HTML

[PDF] neurips.cc

Policy gradient with serial markov chain reasoning

E Cetin, O Celiktutan�- Advances in Neural Information�…, 2022 - proceedings.neurips.cc

… method to estimate the policy gradient. Hence, we implement a new effective off-policy
algorithm for maximum entropy reinforcement learning (MaxEnt RL) [27, 28], named Steady-State …

Save Cite Cited by 1 Related articles All 5 versions View as HTML

Cite

Advanced search

Saved to My library