WANG G J, SHI W Z, LUO W, et al. 3D semantic enhanced diffusion policy for intelligent shipborne robotJ. Chinese Journal of Ship Research, 2026, 21(X): 1–14 (in Chinese). DOI: 10.19693/j.issn.1673-3185.04806
Citation: WANG G J, SHI W Z, LUO W, et al. 3D semantic enhanced diffusion policy for intelligent shipborne robotJ. Chinese Journal of Ship Research, 2026, 21(X): 1–14 (in Chinese). DOI: 10.19693/j.issn.1673-3185.04806

3D semantic enhanced diffusion policy for intelligent shipborne robot

  • Objective With the advancement of artificial intelligence and sensor fusion technologies, shipborne robots equipped with target recognition and autonomous execution capabilities are increasingly becoming a vital development direction for future combat systems. However, during autonomous operations, these robots are constrained by limited onboard computing resources and deployment conditions, hindering the achievement of high-level cognition and decision-making. A lightweight 3D Semantic-enhanced framework centered on a diffusion policy (SGDP) is proposed to address this challenge, aiming to ensure recognition accuracy, response speed, and decision-making stability under conditions of limited resources and dynamic disturbances.
    Methods Firstly, a semantic projection mechanism based on 3D Gaussian splatting is introduced to construct dense semantic point clouds centered on operational objects. Multi-view semantic fusion is utilized to achieve object-level 3D semantic representations, providing more effective semantic priors for complex manipulation tasks. Subsequently, the semantic field is updated in real-time using the FoundationPose estimator for 6D pose estimation, which eliminates the need for repeated extraction of multi-view semantic features, thereby improving inference efficiency while maintaining semantic consistency in dynamic scenes. Furthermore, a multimodal diffusion policy integrating semantic, geometric, and joint state information is designed to enhance semantic perception while being lightweight enough for practical deployment.
    Results Evaluations conducted on a shipborne robot test platform involving three complex tasks—placing a knife, grasping a marker pen, and pouring water from a bottle—demonstrated that the SGDP algorithm, using only a single depth camera for environmental observation, achieved semantic field update and action inference frequencies of 39.71 Hz and 32.16 Hz, respectively. This represents a fivefold increase compared to the GenDP baseline, enabling real-time closed-loop control on computationally limited platforms. Concurrently, a 20.25% reduction in GPU memory usage was observed compared to GenDP, indicating significantly lower computational cost alongside the improved performance. For tasks with known objects, an average success rate of 81.67% was achieved, while a success rate of 78.33% was maintained for tasks involving unknown objects, demonstrating strong zero-shot generalization capability and reduced reliance on extensive task-specific data collection.
    Conclusion The results indicate that the proposed framework offers an efficient and viable integrated perception-decision solution, achieving effective synergy between environmental cognition and autonomous decision-making under resource-limited conditions. Thus, an effective technical pathway is provided to alleviate the inherent conflict between the requirements for lightweight deployment and high-level autonomous decision-making in single-unit unmanned systems such as shipborne robots.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return