Design of AUV controller based on improved PPO algorithm

XU Desheng; XU Chunhui

doi:10.19693/j.issn.1673-3185.04031

XU D S, XU C H. Design of AUV controller based on improved PPO algorithm[J]. Chinese Journal of Ship Research, 2025, 20(1): 350–359 (in Chinese). DOI: 10.19693/j.issn.1673-3185.04031

Citation:

XU D S, XU C H. Design of AUV controller based on improved PPO algorithm[J]. Chinese Journal of Ship Research, 2025, 20(1): 350–359 (in Chinese). DOI: 10.19693/j.issn.1673-3185.04031

Citation:

XU D S, XU C H. Design of AUV controller based on improved PPO algorithm[J]. Chinese Journal of Ship Research, 2025, 20(1): 350–359 (in Chinese). DOI: 10.19693/j.issn.1673-3185.04031

Design of AUV controller based on improved PPO algorithm

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Objective In order to improve the robustness of autonomous underwater vehicle (AUV) controllers to environment modeling errors, this paper proposes a reinforcement learning control strategy that introduces contextual information and a course-learning training mechanism.
Method First, the contextual information is embedded into the policy network using the interaction history data as part of the policy network input; second, the course-learning training mechanism is designed to gradually increase the interference strength during the training process to avoid training instability and early stopping phenomenon caused by too much interference. Fixed-depth control experiments are conducted in a simulation environment, and the effectiveness of the algorithm is further verified using a real AUV in a tank.
Results The experimental results show that the proposed algorithm can improve the convergence speed by 25.00% and the reward steady state value by 10.81%, effectively improving the training process. The proposed algorithm can realize static-free tracking in the simulation environment. In the tank experiment, compared with the domain randomization algorithm and baseline algorithm, the average depth position tracking error of our method was reduced by 45.81% and 63.00% respectively, and the standard deviation was reduced by 36.17% and 52.76% respectively, effectively improving tracking accuracy and stability.
Conclusion The research results can provide useful references for the application of deep reinforcement learning methods in the field of AUV control.

FullText(HTML)

References (20)

Cited By

Design of AUV controller based on improved PPO algorithm

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content