×

A general motion control framework for an autonomous underwater vehicle through deep reinforcement learning and disturbance observers. (English) Zbl 1516.93192

Summary: This paper investigates the application of deep reinforcement learning (RL) in the motion control for an autonomous underwater vehicle (AUV), and proposes a novel general motion control framework which separates training and deployment. Firstly, the state space, action space, and reward function are customized under the condition of ensuring generality for various motion control tasks. Next, in order to efficiently learn the optimal motion control policy in the case that the AUV model is imprecise and there are unknown external disturbances, a virtual AUV model composed of the known and determined items of an actual AUV is put forward and a simulation training method is developed on this basis. Then, in the given deployment method, three independent extended state observers (ESOs) are designed to deal with the unknown items in different directions, and the final controller is obtained by compensating the estimated value of ESOs into the output of the optimal motion control policy obtained through simulation training. Finally, soft actor-critic is chosen as deep RL algorithm of the framework, and the generality and effectiveness of the proposed method are verified in four different AUV motion control tasks.

MSC:

93C85 Automated systems (robots, etc.) in control theory
93B53 Observers
Full Text: DOI

References:

[1] Che, G., Single critic network based fault-tolerant tracking control for underactuated AUV with actuator fault, Ocean Eng., 254, 111380 (2022)
[2] Liu, Z.; Meng, X.; Liu, Y.; Yang, Y.; Wang, Y., AUV-aided hybrid data collection scheme based on value of information for internet of underwater things, IEEE Internet Things J., 9, 9, 6944-6955 (2022)
[3] Rout, R.; Cui, R.; Yan, W., Sideslip-compensated guidance-based adaptive neural control of marine surface vessels, IEEE Trans. Cybern., 52, 5, 2860-2871 (2022)
[4] Zhang, Y.; Wang, X.; Lei, P.; Wang, S.; Yang, Y.; Sun, L.; Zhou, Y., Smart vector-inspired optical vision guiding method for autonomous underwater vehicle docking and formation, Opt. Lett., 47, 11, 2919-2922 (2022)
[5] Lin, X.; Tian, W.; Zhang, W.; Li, Z.; Zhang, C., The fault-tolerant consensus strategy for leaderless multi-AUV system on heterogeneous condensation topology, Ocean Eng., 245, 110541 (2022)
[6] Shojaei, K.; Dolatshahi, M., Line-of-sight target tracking control of underactuated autonomous underwater vehicles, Ocean Eng., 133, 244-252 (2017)
[7] Shen, C.; Shi, Y.; Buckham, B., Trajectory tracking control of an autonomous underwater vehicle using Lyapunov-based model predictive control, IEEE Trans. Ind. Electron., 65, 7, 5796-5805 (2017)
[8] Lamraoui, H. C.; Qidan, Z., Path following control of fully-actuated autonomous underwater vehicle in presence of fast-varying disturbances, Appl. Ocean Res., 86, 40-46 (2019)
[9] Yu, C.; Xiang, X.; Wilson, P. A.; Zhang, Q., Guidance-error-based robust fuzzy adaptive control for bottom following of a flight-style AUV with saturated actuator dynamics, IEEE Trans. Cybern., 50, 5, 1887-1899 (2019)
[10] Wang, X.; Xu, B.; Guo, Y., Fuzzy logic system-based robust adaptive control of AUV with target tracking, Int. J. Fuzzy Syst., 25, 1, 338-346 (2023)
[11] Sutton, R. S.; Barto, A. G., Reinforcement Learning: An Introduction (2018), MIT press · Zbl 1407.68009
[12] Mnih, V.; Kavukcuoglu, K.; Silver, D., Human-level control through deep reinforcement learning, Nature, 518, 7540, 529-533 (2015)
[13] Mnih, V.; Badia, A. P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T., Asynchronous methods for deep reinforcement learning, International Conference on Machine Learning, 1928-1937 (2016), PMLR
[14] T.P. Lillicrap, J.J. Hunt, A. Pritzel, et al., Continuous control with deep reinforcement learning, arXiv preprint arXiv:1509.02971 (2015).
[15] Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M., Deterministic policy gradient algorithms, International Conference on Machine Learning, 387-395 (2014), PMLR
[16] Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P., Trust region policy optimization, International Conference on Machine Learning, 1889-1897 (2015), PMLR
[17] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms, arXiv preprint arXiv:1707.06347 (2017).
[18] Fujimoto, S.; Hoof, H.; Meger, D., Addressing function approximation error in actor-critic methods, International Conference on Machine Learning, 1587-1596 (2018), PMLR
[19] T. Haarnoja, A. Zhou, K. Hartikainen, et al., Soft actor-critic algorithms and applications, arXiv preprint arXiv:1812.05905 (2018).
[20] Liu, Y.; Wang, F.; Lv, Z.; Cao, K.; Lin, Y., Pixel-to-action policy for underwater pipeline following via deep reinforcement learning, 2018 IEEE International Conference of Intelligent Robotic and Control Engineering (IRCE), 135-139 (2018), IEEE
[21] Wu, H.; Song, S.; You, K.; Wu, C., Depth control of model-free AUVs via reinforcement learning, IEEE Trans. Syst., Man, Cybern., 49, 12, 2499-2510 (2018)
[22] Fang, Y.; Huang, Z.; Pu, J.; Zhang, J., AUV position tracking and trajectory control based on fast-deployed deep reinforcement learning method, Ocean Eng., 245, 110452 (2022)
[23] Noguchi, Y.; Maki, T., Path planning method based on artificial potential field and reinforcement learning for intervention AUVs, 2019 IEEE Underwater Technology (UT), 1-6 (2019), IEEE
[24] Guo, X.; Yan, W.; Cui, R., Integral reinforcement learning-based adaptive NN control for continuous-time nonlinear MIMO systems with unknown control directions, IEEE Trans. Syst., Man, Cybern., 50, 11, 4068-4077 (2020)
[25] Shi, W.; Song, S.; Wu, C.; Chen, C. P., Multi pseudo Q-learning-based deterministic policy gradient for tracking control of autonomous underwater vehicles, IEEE Trans. Neural Netw. Learn. Syst., 30, 12, 3534-3546 (2019)
[26] Wu, H.; Song, S.; Hsu, Y.; You, K.; Wu, C., End-to-end sensorimotor control problems of AUVs with deep reinforcement learning, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 5869-5874 (2019), IEEE
[27] Bhopale, P.; Kazi, F.; Singh, N., Reinforcement learning based obstacle avoidance for autonomous underwater vehicle, J. Mar. Sci. Appl., 18, 2, 228-238 (2019)
[28] Havenstrm, S. T.; Rasheed, A.; San, O., Deep reinforcement learning controller for 3D path following and collision avoidance by autonomous underwater vehicles, Front. Robot. AI, 7, 566037 (2021)
[29] Homan, E.; Davis, S.; Hall, K.; McClure, S.; Sustersic, J.; Narayanan, V., Training UUV navigation and contact avoidance with reinforcement learning, OCEANS 2019 MTS/IEEE SEATTLE, 1-5 (2019), IEEE
[30] Sun, Y.; Cheng, J.; Zhang, G.; Xu, H., Mapless motion planning system for an autonomous underwater vehicle using policy gradient-based deep reinforcement learning, J. Intell. Robot. Syst., 96, 3, 591-601 (2019)
[31] Carlucho, I.; De Paula, M.; Wang, S.; Petillot, Y.; Acosta, G. G., Adaptive low-level control of autonomous underwater vehicles using deep reinforcement learning, Robot. Auton. Syst., 106, 71-86 (2018)
[32] Jiang, J.; Zhang, R.; Fang, Y.; Wang, X., Research on motion attitude control of under-actuated autonomous underwater vehicle based on deep reinforcement learning, Journal of Physics: Conference Series (2020), IOP Publishing
[33] Ariza Ramirez, W.; Leong, Z. Q.; Nguyen, H. D.; Jayasinghe, S. G., Exploration of the applicability of probabilistic inference for learning control in underactuated autonomous underwater vehicles, Auton. Robot., 44, 6, 1121-1134 (2020)
[34] Carlucho, I.; De Paula, M.; Wang, S.; Menna, B. V.; Petillot, Y. R.; Acosta, G. G., AUV position tracking control using end-to-end deep reinforcement learning, OCEANS 2018 MTS/IEEE Charleston, 1-8 (2018), IEEE
[35] Fossen, T. I., Handbook of Marine Craft Hydrodynamics and Motion Control (2011), John Wiley & Sons
[36] Han, J. Q.; Zhang, R., Error analysis of the second order ESO, J. Syst. Sci. Math. Sci., 19, 4, 465-471 (1999) · Zbl 0958.65079
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.