Phasic policy gradient

KW Cobbe, J Hilton, O Klimov…�- …�on Machine Learning, 2021 - proceedings.mlr.press
… Specifically, we set the policy gradient loss (… gradient is the policy gradient estimator,
subject to a constraint on the KL-divergence between the original policy and the updated policy. …

Resource Allocation in Time Slotted Channel Hopping (TSCH) networks based on phasic policy gradient reinforcement learning

L Bommisetty, TG Venkatesh�- Internet of Things, 2022 - Elsevier
… Motivated by these gaps in the literature to design a scheduling algorithm that achieves
global optimal solution, we propose a phasic policy gradient reinforcement learning based …

Phasic Policy Gradient Based Resource Allocation for Industrial Internet of Things

L Bommisetty…�- 2022 IEEE 19th Annual�…, 2022 - ieeexplore.ieee.org
… In this paper, we propose a phasic policy gradient (PPG) based TSCH schedule learning …
-critic policy gradient method that learns the scheduling algorithm in two phases, namely policy

MAPPG: Multi-Agent Phasic Policy Gradient

Q Zhang, X Zhang, Y Liu, X Zhang…�- 2023 62nd IEEE�…, 2023 - ieeexplore.ieee.org
We propose a Multi-Agent Phasic Policy Gradient (MAPPG) algorithm, which can assist
agents to further alleviate the non-stationarity of the environment. Different from the existing …

PPG reloaded: an empirical study on what matters in phasic policy gradient

K Wang, D Zhou, J Feng, S Mannor - 2023 - openreview.net
… In model-free reinforcement learning, recent methods based on a phasic policy gradient (PPG) …
However, through an extensive empirical study, we unveil that policy regularization and …

[HTML][HTML] Spike-based reinforcement learning in continuous state and action space: when policy gradient methods fail

E Vasilaki, N Fr�maux, R Urbanczik…�- PLoS computational�…, 2009 - journals.plos.org
… The family of learning rules includes an optimal rule derived from policy gradient methods
as … We show that in this architecture, a standard policy gradient rule fails to solve the Morris …

Phasic self-imitative reduction for sparse-reward goal-conditioned reinforcement learning

Y Li, T Gao, J Yang, H Xu, Y Wu�- …�conference on machine�…, 2022 - proceedings.mlr.press
… The RL objective is policy gradient over rollout data (Eqn. 1), which requires (primarily) on-policy
samples (both success and failures) to make policy improvement. The SL objective (Eqn…

Correcting discount-factor mismatch in on-policy policy gradient methods

F Che, G Vasan, AR Mahmood�- …�Conference on Machine�…, 2023 - proceedings.mlr.press
… other on-policy policy gradient estimators, including batch actor-critic (Konda & Tsitsiklis 1999)
and proximal policy … Our work establishes a more principled policy gradient estimator with …

Model-free policy learning with reward gradients

Q Lan, S Tosatto, H Farrahi, AR Mahmood�- arXiv preprint arXiv�…, 2021 - arxiv.org
… Reward Policy Gradient estimator, a novel approach that integrates reward gradients without …
, we develop a new policy gradient estimator—the Reward Policy Gradient (RPG) estimator…

Policy gradient with serial markov chain reasoning

E Cetin, O Celiktutan�- Advances in Neural Information�…, 2022 - proceedings.neurips.cc
… method to estimate the policy gradient. Hence, we implement a new effective off-policy
algorithm for maximum entropy reinforcement learning (MaxEnt RL) [27, 28], named Steady-State …