Skip to main content
Log in

Offline reinforcement learning with anderson acceleration for robotic tasks

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Offline reinforcement learning (RL) can learn effective policy from a fixed batch of data without interaction. However, the real-world requirements, such as better performance and high sample efficiency, put substantial challenges on current offline RL algorithms. In this paper, we propose a novel offline RL method, Constrained and Conservative Reinforcement Learning with Anderson Acceleration (CCRL-AA), which aims to enable the agent to effectively and efficiently learn from offline demonstration data. In our method, Constrained and Conservative Reinforcement Learning (CCRL) restricts the policy’s actions with respect to a batch of training data and learns a conservative Q-function to make the agent effectively learn from the previously collected demonstrations. The mechanism of Anderson acceleration (AA) is integrated to speed up the learning process and improve sample efficiency. Experiments were conducted on robotic simulation tasks, and the results demonstrate that our method can efficiently learn from given demonstrations and give better performance than several other state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Zhu Z, Zhao H (2021) A Survey of Deep RL and IL for Autonomous Driving Policy Learning. arXiv:2101.01993

  2. Kuderer M, Gulati S, Burgard W (2015) Learning driving styles for autonomous vehicles from demonstration. In: IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 2641–2646

  3. Yu P, Lee J S, Kulyatin I et al (2019) Model-based deep reinforcement learning for dynamic portfolio optimization. arXiv:1901.08740

  4. Yuan Y, Wen W, Yang J (2020) Using data augmentation based reinforcement learning for daily stock trading. Electronics 9(9):1384

    Article  Google Scholar 

  5. Paulus R, Xiong C, Socher R (2018) A Deep Reinforced Model for Abstractive Summarization. In: International Conference on Learning Representations (ICLR)

  6. Grissom II A, He H, Boyd-Graber J et al (2014) Don’t until the final verb wait: Reinforcement learning for simultaneous machine translation. In: Proceedings of the 2014 Conference on empirical methods in natural language processing (EMNLP), pp 1342– 1352

  7. Kalashnikov D, Irpan A, Pastor P et al (2018) Scalable deep reinforcement learning for vision-based robotic manipulation. In: Conference on Robot Learning (PMLR), pp 651–673

  8. Radosavovic I, Wang X, Pinto L et al (2020) State-only imitation learning for dexterous manipulation. arXiv:2004.04650

  9. Wu R, Li M, Yao Z et al (2021) Reinforcement Learning Enabled Automatic Impedance Control of a Robotic Knee Prosthesis to Mimic the Intact Knee Motion in a Co-Adapting Environment. CoRR arXiv:2101.03487

  10. Sutton R S, Barto A G (2018) Reinforcement learning: An introduction. MIT press

  11. Heess N, TB D, Sriram S et al (2017) Emergence of locomotion behaviours in rich environments. arXiv:1707.02286

  12. Yuan Y, Kitani K (2020) Residual Force Control for Agile Human Behavior Imitation and Extended Motion Synthesis. In: 33th Advances in Neural Information Processing Systems (NIPS)

  13. Dulac-Arnold G, Levine N, Mankowitz DJ et al (2021) Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Machine Learning

  14. Fujimoto S, Meger D, Precup D (2019) Off-policy deep reinforcement learning without exploration. In: International Conference on Machine Learning (PMLR), pp 2052–2062

  15. Kumar A, Zhou A, Tucker G et al (2020) Conservative q-learning for offline reinforcement learning. In: 33th Advances in Neural Information Processing Systems (NIPS)

  16. Kumar A, Fu J, Soh M, Tucker G, Levine S (2019) Stabilizing off-policy q-learning via bootstrapping error reduction. In: Advances in Neural Information Processing Systems (NIPS), pp 11761–11771

  17. Wang Z, Novikov A, Zolna K et al (2020) Critic Regularized Regression. In: 33th Advances in Neural Information Processing Systems (NIPS)

  18. Shi W, Song S, Wu H, et al (2019) Regularized anderson acceleration for off-policy deep reinforcement learning. In: Advances in Neural Information Processing Systems (NIPS), pp 10231–10241

  19. Yang S et al (2021) Efficient spike-driven learning with dendritic event-based processing. Front Neurosci 15(2021):97

    Google Scholar 

  20. Yang S, Deng B, Wang J et al (2019) Scalable digital neuromorphic architecture for large-scale biophysically meaningful neural network with multi-compartment neurons[J]. IEEE Trans Neural Netw Learn Syst 31(1):148–162

    Article  Google Scholar 

  21. Geist M, Scherrer B (2018) Anderson acceleration for reinforcement learning. In: 2018-4th European workshop on Reinforcement Learning (EWRL)

  22. Gordon G J (1995) Stable function approximation in dynamic programming. In: Machine Learning Proceedings, Morgan Kaufmann, pp 261–268

  23. Ormoneit D, Sen Ś (2002) Kernel-based reinforcement learning. Mach Learn 49(2):161–178

    Article  Google Scholar 

  24. Ernst D, Geurts P, Wehenkel L (2005) Tree-based batch mode reinforcement learning. J Mach Learn Res 6:503–556

    MathSciNet  MATH  Google Scholar 

  25. Levine S, Kumar A, Tucker G et al (2020) Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv:2005.01643

  26. Nair A, Dalal M, Gupta A et al (2020) Accelerating online reinforcement learning with offline datasets. arXiv:2006.09359

  27. Laroche R, Trichelair P, Des Combes R T (2019) Safe policy improvement with baseline bootstrapping. In: International Conference on Machine Learning (PMLR), pp: 3652– 3661

  28. Nadjahi K, Laroche R, des Combes R T (2019) Safe policy improvement with soft baseline bootstrapping. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD). Springer, pp 53–68

  29. Agarwal R, Schuurmans D, Norouzi M (2020) An optimistic perspective on offline reinforcement learning. In: International Conference on Machine Learning (PMLR), pp 104–114

  30. Wu Y, Tucker G, Nachum O (2019) Behavior regularized offline reinforcement learning. arXiv:1911.11361

  31. Jaques N, Ghandeharioun A, Shen J H et al (2019) Way off-policy batch deep reinforcement learning of implicit human preferences in dialog. arXiv:1907.00456

  32. Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning (PMLR), pp 1587– 1596

  33. Lillicrap T P, Hunt J J, Pritzel A et al (2016) Continuous control with deep reinforcement learning (ICLR) (Poster)

  34. Zuo G, Zhao Q, Chen K et al (2020) Off-policy adversarial imitation learning for robotic tasks with low-quality demonstrations. Appl Soft Comput 97:106795

  35. Kingma D P, Welling M (2013) Auto-encoding variational bayes. Stat 1050:1

    MATH  Google Scholar 

  36. Brockman G, Cheung V, Pettersson L et al (2016) OpenAI Gym. arXiv:1606.01540

  37. Hester T, Vecerik M, Pietquin O et al (2018) Deep Q-learning From Demonstrations (AAAI)

  38. Nair A, McGrew B, Andrychowicz M et al (2018) Overcoming exploration in reinforcement learning with demonstrations. In: 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 6292–6299

  39. Farag W, Saleh Z (2018) Behavior cloning for autonomous driving using convolutional neural networks. In: 2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT), pp 17

  40. Anschel O, Baram N, Shimkin N (2017) Averaged-dqn: Variance reduction and stabilization for deep reinforcement learning. In: International Conference on Machine Learning (PMLR), pp 176–185

  41. Yang S, Wang J, Deng B et al (2021) Neuromorphic context-dependent learning framework with fault-tolerant spike routing[J]. In: IEEE Transactions on Neural Networks and Learning Systems

Download references

Acknowledgements

This work is partially supported by National Natural Science Foundation of China (61873008), Beijing Natural Science Foundation (4192010) and National Key R & D Plan (2018YFB1307004).

Funding

This work is partially supported by National Natural Science Foundation of China (61873008), Beijing Natural Science Foundation (4192010) and National Key R & D Plan (2018YFB1307004).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiangeng Li.

Ethics declarations

Competing Interests

All authors of this paper declare no conflict of interest in this paper and agree to submit this manuscript to the journal of Applied Intelligence.

Additional information

Author contributions

All authors contributed to the study conception and design. Material preparation, experimental design and data analysis were performed by Guoyu Zuo and Shuai Huang. Manuscript writing and organization were done by Jiangeng Li. Review and commentary were done by Daoxiong Gong. All authors read and approved this manuscript.

Availability of data and materials

The raw/processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study.

Ethical Approval

The authors declare that they have no conflict of interest. This paper has not been previously published, it is published with the permission of the authors’ institution, and all authors of this paper are responsible for the authenticity of the data in the paper.

Consent to Participate

All authors of this paper have been informed of the revision and publication of the paper, have checked all data, figures and tables in the manuscript, and are responsible for their truthfulness and accuracy. Names of all contributing authors: Guoyu Zuo; Shuai Huang; Jiangeng Li; Daoxiong Gong.

Consent to Publish

The publication has been approved by all co-authors.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zuo, G., Huang, S., Li, J. et al. Offline reinforcement learning with anderson acceleration for robotic tasks. Appl Intell 52, 9885–9898 (2022). https://doi.org/10.1007/s10489-021-02953-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02953-8

Keywords

Navigation