Objective In unmanned surface vessel (USV) swarm missions, challenges such as insufficient generalization, slow response in scenarios with small samples, limited resources, and complex dynamics often hinder target recognition, situational awareness, and autonomous decision-making control. This paper aims to develop an end-to-end multi-agent architecture that combines the reasoning capabilities of large language models with real-time control functionalities to improve the collaborative combat and autonomous control abilities of USV swarms in complex environments.
Method This study proposes a three-tier collaborative 'perception–understanding–decisions' architecture for the maritime commander agent (MCA), integrating the Qwen2.5-72B large language model throughout the decision-making process of USV swarms. By combining prompt engineering with proportional-integral-derivative (PID) control, the approach achieves a deep integration of high-level strategic planning and low-level precise control. The system comprises: a target perception agent, which employs YOLOv8l to achieve high-precision detection and localization of multiple types of maritime targets; a situational understanding agent, which utilizes structured prompt templates to translate perception results into high-level natural language situational descriptions; and a decision-making agent, which combines large language model inference with external computational tools to generate control commands and optimizes response speed via PID regulation. This architecture requires no additional model fine-tuning and exhibits low latency and excellent adaptability.
Results Experimental results demonstrate that the system achieves high accuracy in object detection on the public ABOships dataset and under degraded conditions such as heavy fog, heavy rain. The semantic conversion accuracy in situational understanding reaches 93.5%. Simulation experiment results show that the success rate of the 4v1 encirclement task improves from 20% with traditional rule-based methods to 80%, while the success rate of the 10v10 adversarial task increases from 25% to 75%. These results validate the system's robustness and cross-domain generalization capabilities in complex maritime environments.
Conclusion The proposed MCA architecture maintains the high-level cognitive reasoning capabilities of large language models while enhancing real-time response and execution accuracy through PID control, significantly improving the collaborative decision-making abilities of unmanned vessel swarms in dynamic tasks. This research provides a practical technical framework and engineering reference for intelligent maritime cluster systems. This study focuses on constructing an end-to-end multi-agent system based on large language models to tackle the challenges of target identification, situational understanding, and autonomous decision-making in unmanned vessel swarm operations. Existing unmanned vessel decision-making systems largely rely on rule-based methods, which often fall short in complex scenarios with limited data and resources. Although large language models offer robust reasoning and generalization capabilities, they face difficulties in meeting the requirements of high-frequency, real-time control. To address these challenges, this paper proposes a three-tier collaborative 'perception–understanding–decisions' architecture, termed the MCA. The MCA fully exploits the reasoning capabilities of large language models for intelligent decision-making while integrating PID control to optimize system response speed. The system consists of three types of agents: target perception agents, responsible for multi-target detection and spatial localization; a situational understanding agent, which converts perception results into high-level natural language situational descriptions; and a decision-making agent, which generates real-time control commands, achieving deep integration between high-level planning and low-level execution. Simulation results demonstrate that this approach outperforms traditional rule-based decision-making methods in key metrics, such as task completion rate and adversarial success rate, showcasing its strong potential as a new paradigm for intelligent maritime swarm collaboration systems.