Offline reinforcement learning with anderson acceleration for robotic tasks

Guoyu Zuo^1,2,
Shuai Huang^1,2,
Jiangeng Li ORCID: orcid.org/0000-0001-5715-7824^1,2 &
…
Daoxiong Gong^1,2

558 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Offline reinforcement learning (RL) can learn effective policy from a fixed batch of data without interaction. However, the real-world requirements, such as better performance and high sample efficiency, put substantial challenges on current offline RL algorithms. In this paper, we propose a novel offline RL method, Constrained and Conservative Reinforcement Learning with Anderson Acceleration (CCRL-AA), which aims to enable the agent to effectively and efficiently learn from offline demonstration data. In our method, Constrained and Conservative Reinforcement Learning (CCRL) restricts the policy’s actions with respect to a batch of training data and learns a conservative Q-function to make the agent effectively learn from the previously collected demonstrations. The mechanism of Anderson acceleration (AA) is integrated to speed up the learning process and improve sample efficiency. Experiments were conducted on robotic simulation tasks, and the results demonstrate that our method can efficiently learn from given demonstrations and give better performance than several other state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sample-Efficient Reinforcement Learning Based on Dynamics Models via Meta-policy Optimization

Adaptable Conservative Q-Learning for Offline Reinforcement Learning

BiES: Adaptive Policy Optimization for Model-Based Offline Reinforcement Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Zhu Z, Zhao H (2021) A Survey of Deep RL and IL for Autonomous Driving Policy Learning. arXiv:2101.01993
Kuderer M, Gulati S, Burgard W (2015) Learning driving styles for autonomous vehicles from demonstration. In: IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 2641–2646
Yu P, Lee J S, Kulyatin I et al (2019) Model-based deep reinforcement learning for dynamic portfolio optimization. arXiv:1901.08740
Yuan Y, Wen W, Yang J (2020) Using data augmentation based reinforcement learning for daily stock trading. Electronics 9(9):1384
Article Google Scholar
Paulus R, Xiong C, Socher R (2018) A Deep Reinforced Model for Abstractive Summarization. In: International Conference on Learning Representations (ICLR)
Grissom II A, He H, Boyd-Graber J et al (2014) Don’t until the final verb wait: Reinforcement learning for simultaneous machine translation. In: Proceedings of the 2014 Conference on empirical methods in natural language processing (EMNLP), pp 1342– 1352
Kalashnikov D, Irpan A, Pastor P et al (2018) Scalable deep reinforcement learning for vision-based robotic manipulation. In: Conference on Robot Learning (PMLR), pp 651–673
Radosavovic I, Wang X, Pinto L et al (2020) State-only imitation learning for dexterous manipulation. arXiv:2004.04650
Wu R, Li M, Yao Z et al (2021) Reinforcement Learning Enabled Automatic Impedance Control of a Robotic Knee Prosthesis to Mimic the Intact Knee Motion in a Co-Adapting Environment. CoRR arXiv:2101.03487
Sutton R S, Barto A G (2018) Reinforcement learning: An introduction. MIT press
Heess N, TB D, Sriram S et al (2017) Emergence of locomotion behaviours in rich environments. arXiv:1707.02286
Yuan Y, Kitani K (2020) Residual Force Control for Agile Human Behavior Imitation and Extended Motion Synthesis. In: 33th Advances in Neural Information Processing Systems (NIPS)
Dulac-Arnold G, Levine N, Mankowitz DJ et al (2021) Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Machine Learning
Fujimoto S, Meger D, Precup D (2019) Off-policy deep reinforcement learning without exploration. In: International Conference on Machine Learning (PMLR), pp 2052–2062
Kumar A, Zhou A, Tucker G et al (2020) Conservative q-learning for offline reinforcement learning. In: 33th Advances in Neural Information Processing Systems (NIPS)
Kumar A, Fu J, Soh M, Tucker G, Levine S (2019) Stabilizing off-policy q-learning via bootstrapping error reduction. In: Advances in Neural Information Processing Systems (NIPS), pp 11761–11771
Wang Z, Novikov A, Zolna K et al (2020) Critic Regularized Regression. In: 33th Advances in Neural Information Processing Systems (NIPS)
Shi W, Song S, Wu H, et al (2019) Regularized anderson acceleration for off-policy deep reinforcement learning. In: Advances in Neural Information Processing Systems (NIPS), pp 10231–10241
Yang S et al (2021) Efficient spike-driven learning with dendritic event-based processing. Front Neurosci 15(2021):97
Google Scholar
Yang S, Deng B, Wang J et al (2019) Scalable digital neuromorphic architecture for large-scale biophysically meaningful neural network with multi-compartment neurons[J]. IEEE Trans Neural Netw Learn Syst 31(1):148–162
Article Google Scholar
Geist M, Scherrer B (2018) Anderson acceleration for reinforcement learning. In: 2018-4th European workshop on Reinforcement Learning (EWRL)
Gordon G J (1995) Stable function approximation in dynamic programming. In: Machine Learning Proceedings, Morgan Kaufmann, pp 261–268
Ormoneit D, Sen Ś (2002) Kernel-based reinforcement learning. Mach Learn 49(2):161–178
Article Google Scholar
Ernst D, Geurts P, Wehenkel L (2005) Tree-based batch mode reinforcement learning. J Mach Learn Res 6:503–556
MathSciNet MATH Google Scholar
Levine S, Kumar A, Tucker G et al (2020) Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv:2005.01643
Nair A, Dalal M, Gupta A et al (2020) Accelerating online reinforcement learning with offline datasets. arXiv:2006.09359
Laroche R, Trichelair P, Des Combes R T (2019) Safe policy improvement with baseline bootstrapping. In: International Conference on Machine Learning (PMLR), pp: 3652– 3661
Nadjahi K, Laroche R, des Combes R T (2019) Safe policy improvement with soft baseline bootstrapping. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD). Springer, pp 53–68
Agarwal R, Schuurmans D, Norouzi M (2020) An optimistic perspective on offline reinforcement learning. In: International Conference on Machine Learning (PMLR), pp 104–114
Wu Y, Tucker G, Nachum O (2019) Behavior regularized offline reinforcement learning. arXiv:1911.11361
Jaques N, Ghandeharioun A, Shen J H et al (2019) Way off-policy batch deep reinforcement learning of implicit human preferences in dialog. arXiv:1907.00456
Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning (PMLR), pp 1587– 1596
Lillicrap T P, Hunt J J, Pritzel A et al (2016) Continuous control with deep reinforcement learning (ICLR) (Poster)
Zuo G, Zhao Q, Chen K et al (2020) Off-policy adversarial imitation learning for robotic tasks with low-quality demonstrations. Appl Soft Comput 97:106795
Kingma D P, Welling M (2013) Auto-encoding variational bayes. Stat 1050:1
MATH Google Scholar
Brockman G, Cheung V, Pettersson L et al (2016) OpenAI Gym. arXiv:1606.01540
Hester T, Vecerik M, Pietquin O et al (2018) Deep Q-learning From Demonstrations (AAAI)
Nair A, McGrew B, Andrychowicz M et al (2018) Overcoming exploration in reinforcement learning with demonstrations. In: 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 6292–6299
Farag W, Saleh Z (2018) Behavior cloning for autonomous driving using convolutional neural networks. In: 2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT), pp 17
Anschel O, Baram N, Shimkin N (2017) Averaged-dqn: Variance reduction and stabilization for deep reinforcement learning. In: International Conference on Machine Learning (PMLR), pp 176–185
Yang S, Wang J, Deng B et al (2021) Neuromorphic context-dependent learning framework with fault-tolerant spike routing[J]. In: IEEE Transactions on Neural Networks and Learning Systems

Download references

Acknowledgements

This work is partially supported by National Natural Science Foundation of China (61873008), Beijing Natural Science Foundation (4192010) and National Key R & D Plan (2018YFB1307004).

Funding

This work is partially supported by National Natural Science Foundation of China (61873008), Beijing Natural Science Foundation (4192010) and National Key R & D Plan (2018YFB1307004).

Author information

Authors and Affiliations

Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
Guoyu Zuo, Shuai Huang, Jiangeng Li & Daoxiong Gong
Beijing Key Laboratory of Computing Intelligence and Intelligent Systems, Beijing, 100124, China
Guoyu Zuo, Shuai Huang, Jiangeng Li & Daoxiong Gong

Authors

Guoyu Zuo
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jiangeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Daoxiong Gong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiangeng Li.

Ethics declarations

Competing Interests

All authors of this paper declare no conflict of interest in this paper and agree to submit this manuscript to the journal of Applied Intelligence.

Additional information

Author contributions

All authors contributed to the study conception and design. Material preparation, experimental design and data analysis were performed by Guoyu Zuo and Shuai Huang. Manuscript writing and organization were done by Jiangeng Li. Review and commentary were done by Daoxiong Gong. All authors read and approved this manuscript.

Availability of data and materials

The raw/processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study.

Ethical Approval

The authors declare that they have no conflict of interest. This paper has not been previously published, it is published with the permission of the authors’ institution, and all authors of this paper are responsible for the authenticity of the data in the paper.

Consent to Participate

All authors of this paper have been informed of the revision and publication of the paper, have checked all data, figures and tables in the manuscript, and are responsible for their truthfulness and accuracy. Names of all contributing authors: Guoyu Zuo; Shuai Huang; Jiangeng Li; Daoxiong Gong.

Consent to Publish

The publication has been approved by all co-authors.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zuo, G., Huang, S., Li, J. et al. Offline reinforcement learning with anderson acceleration for robotic tasks. Appl Intell 52, 9885–9898 (2022). https://doi.org/10.1007/s10489-021-02953-8

Download citation

Accepted: 22 October 2021
Published: 10 January 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s10489-021-02953-8

Offline reinforcement learning with anderson acceleration for robotic tasks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Sample-Efficient Reinforcement Learning Based on Dynamics Models via Meta-policy Optimization

Adaptable Conservative Q-Learning for Offline Reinforcement Learning

BiES: Adaptive Policy Optimization for Model-Based Offline Reinforcement Learning

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interests

Additional information

Author contributions

Availability of data and materials

Ethical Approval

Consent to Participate

Consent to Publish

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Offline reinforcement learning with anderson acceleration for robotic tasks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Sample-Efficient Reinforcement Learning Based on Dynamics Models via Meta-policy Optimization

Adaptable Conservative Q-Learning for Offline Reinforcement Learning

BiES: Adaptive Policy Optimization for Model-Based Offline Reinforcement Learning

Explore related subjects

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interests

Additional information

Author contributions

Availability of data and materials

Ethical Approval

Consent to Participate

Consent to Publish

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation