概率感知型时序差分DQN无人艇智能搜潜决策

刘婷; 王辉; 郑凯; 高其强

doi:10.19693/j.issn.1673-3185.04440

概率感知型时序差分DQN无人艇智能搜潜决策

Probabilistic perception-based TS-DQN decision-making for autonomous USV submarine search

摘要

摘要:
目的针对无人水面艇（USV）对潜艇的探查问题，提出一种基于深度强化学习的无人艇智能搜潜算法。
方法首先，构建探查环境和运动学模型，并建立综合了距离与角度影响的声呐探测概率模型，明确界定探查任务的成功判定条件。然后，将该问题形式化为马尔科夫决策过程（MDP），其中无人艇作为强化学习中的智能体（agent），结合探查任务的需求设计包含探测概率的状态空间，耦合探测概率、距离和角度的多元奖励函数。最后，为了求解该MDP问题，提出融合双重决斗网络架构与优先经验回放的时序差分深度Q网络（TS-DQN）算法，引入探测概率感知型ε-贪婪策略，使智能体能够根据当前环境的探测状态自动调整探索倾向，提升其策略学习效率。
结果仿真实验表明，所提方法在探测成功率方面达到38.85%，比排名第2的Dueling DQN（成功率 2.15%）高出18倍；同时平均路径步长为334.36步，较其他算法减少9.5%以上。
结论研究成果在探测效率与有效性方面展现出的综合优势，为无人艇自主探查技术的发展提供了一种新的解决方案。

Abstract:
Objective This study aims to develop a deep reinforcement learning-based search algorithm for unmanned surface vehicles (USVs) in submarine detection tasks.
Method The study is conducted in the context of submarines infiltrating key maritime areas, where a search environment and a kinematic model are constructed. A sonar detection probability model is developed, incorporating the effects of both distance and angle, with well-defined criteria for determining detection success. Based on this framework, a Markov decision process (MDP) is formulated using the deep Q-network (DQN) algorithm. The state space explicitly includes detection probability, while a multi-objective reward function is designed to integrate detection probability, distance, and angle. To enhance learning efficiency, a temporal difference DQN with probabilistic sensing (TS-DQN) algorithm is proposed, combining a double-dueling network architecture with prioritized experience replay. Additionally, a probabilistic perception-based ε-greedy exploration strategy is implemented, enabling dynamical adjustment of exploration behavior based on real-time detection states, thereby significantly improving policy learning efficiency.
Results Extensive simulation experiments demonstrate that the proposed method achieves a detection success rate of 38.85%, which is 18 times higher than that of the second-best method, Dueling DQN. The approach also reduces the average path length to 334.36 steps, shortening the search trajectory by more than 9.5% compared to other algorithms.
Conclusion The proposed algorithm exhibits significant advantages in detection efficiency and effectiveness, providing an innovative solution for advancing autonomous USV-based search and detection technologies.

HTML全文

参考文献(24)

施引文献

资源附件(0)