概率感知型时序差分DQN无人艇智能搜潜决策

Probabilistic perception-based TS-DQN decision-making for autonomous USV submarine search

  • 摘要:
    目的 针对无人水面艇(USV)对潜艇的探查问题,提出一种基于深度强化学习的无人艇智能搜潜算法。
    方法 首先,构建探查环境和运动学模型,并建立综合了距离与角度影响的声呐探测概率模型,明确界定探查任务的成功判定条件。然后,将该问题形式化为马尔科夫决策过程(MDP),其中无人艇作为强化学习中的智能体(agent),结合探查任务的需求设计包含探测概率的状态空间,耦合探测概率、距离和角度的多元奖励函数。最后,为了求解该MDP问题,提出融合双重决斗网络架构与优先经验回放的时序差分深度Q网络(TS-DQN)算法,引入探测概率感知型ε-贪婪策略,使智能体能够根据当前环境的探测状态自动调整探索倾向,提升其策略学习效率。
    结果 仿真实验表明,所提方法在探测成功率方面达到38.85%,比排名第2的Dueling DQN(成功率 2.15%)高出18倍;同时平均路径步长为334.36步,较其他算法减少9.5%以上。
    结论 研究成果在探测效率与有效性方面展现出的综合优势,为无人艇自主探查技术的发展提供了一种新的解决方案。

     

    Abstract:
    Objective This study aims to develop a deep reinforcement learning-based search algorithm for unmanned surface vehicles (USVs) in submarine detection tasks.
    Method The study is conducted in the context of submarines infiltrating key maritime areas, where a search environment and a kinematic model are constructed. A sonar detection probability model is developed, incorporating the effects of both distance and angle, with well-defined criteria for determining detection success. Based on this framework, a Markov decision process (MDP) is formulated using the deep Q-network (DQN) algorithm. The state space explicitly includes detection probability, while a multi-objective reward function is designed to integrate detection probability, distance, and angle. To enhance learning efficiency, a temporal difference DQN with probabilistic sensing (TS-DQN) algorithm is proposed, combining a double-dueling network architecture with prioritized experience replay. Additionally, a probabilistic perception-based ε-greedy exploration strategy is implemented, enabling dynamical adjustment of exploration behavior based on real-time detection states, thereby significantly improving policy learning efficiency.
    Results Extensive simulation experiments demonstrate that the proposed method achieves a detection success rate of 38.85%, which is 18 times higher than that of the second-best method, Dueling DQN. The approach also reduces the average path length to 334.36 steps, shortening the search trajectory by more than 9.5% compared to other algorithms.
    Conclusion The proposed algorithm exhibits significant advantages in detection efficiency and effectiveness, providing an innovative solution for advancing autonomous USV-based search and detection technologies.

     

/

返回文章
返回