Detection of water surface targets based on improved Deformable DETR

WANG Pengjiu; Gong Junbin; LUO Wei; HUANG Xiao; GUO Junjie

doi:10.19693/j.issn.1673−3185.03645

WANG P J, Gong J B, LUO W, et al. Detection of water surface targets based on improved Deformable DETR[J]. Chinese Journal of Ship Research, 2025, 20(X): 1–13 (in Chinese). DOI: 10.19693/j.issn.1673−3185.03645

Citation:

Detection of water surface targets based on improved Deformable DETR

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Objective With the development of technology and the increasing demand for water resource exploration, water surface target detection plays a crucial role in various applications such as ship navigation and maritime safety. However, traditional detection methods face challenges, and existing deep - learning - based algorithms have limitations in this field, including limited datasets and insufficient detection speed after improvement. The aim of this study is to develop an improved object - detection algorithm based on Deformable DETR for intelligent recognition of water surface targets. The algorithm aims to significantly enhance the inference and training speed of the model while improving the detection accuracy, thus achieving more efficient and robust water surface target detection.
Methods Firstly, a new water surface target dataset was constructed. Then, the original feature - extraction network of Deformable DETR was replaced with the lightweight MobileNetV3. MobileNetV3, which has different versions, is a lightweight network with high recognition accuracy and small model parameters. The MobileNetV3 - Small version was chosen as the feature - extraction backbone. It has a series of operations like depth - separable convolution, and it also includes SE modules and the Hard - swish activation function. To further reduce the model size and enhance the detection ability, three output feature maps from specific modules of MobileNetV3 - Small were directly used for multi - scale feature extraction. Secondly, the CBAM attention mechanism module was introduced. CBAM is a lightweight and universal module that combines channel attention and spatial attention. It can be easily integrated into the network. By replacing the SE module in MobileNetV3 with CBAM, the model's ability to extract features was further improved. The channel attention module in CBAM processes the input feature map through average pooling and max pooling, and then uses a shared neural network and a sigmoid function to generate channel - attention features. The spatial attention module, after performing pooling operations on the channel - dimension of the feature map processed by the channel - attention module, conducts convolution and sigmoid activation to obtain spatial - attention features. Finally, the improved Deformable DETR network was obtained by integrating MobileNetV3 and the CBAM attention - mechanism module. The input image passes through the MobileNetV3 - Small network with embedded CBAM, and three different - scale feature maps are extracted. These feature maps are processed and then fed into the Deformable DETR's Transformer structure for further processing.
Results Ablation experiments were carried out on the self - constructed dataset and the ABOships dataset. On the self - constructed dataset, compared with the original Deformable DETR model, the improved algorithm reduced the model's parameter count and size to about one - third. The model inference speed increased by 52.0%, and the mAP0.5:0.95 increased by 2.4%. Training time was reduced to 41.7% of the original algorithm. On the ABOships dataset, the inference speed increased by 82.7%, the mAP0.5:0.95 increased by 7.5%, and the training time was 51.9% of the original. The model's loss function value during training converged faster and more stably. In the comparison tests with other common algorithms (YOLOv3, Faster R - CNN, Mask R - CNN) on the ABOships dataset, the improved algorithm showed superiority. In terms of mAP0.5, it reached 50.0%, higher than the other algorithms. In mAP0.5:0.95, it was 21.7%, leading in fine - grained detection. The model's parameter count was only 12.9M, much lower than others, indicating high parameter efficiency. Although the frame rate was slightly lower than that of YOLOv3 and Faster R - CNN, it was significantly higher than that of Mask R - CNN, maintaining a reasonable processing speed while ensuring high detection accuracy.
Conclusions The improved Deformable DETR algorithm proposed in this paper effectively improves the performance of water surface target detection. It successfully reduces the model's parameter count and storage requirements, accelerates the training and inference speed, and enhances the recognition accuracy. The experimental results on different datasets verify the effectiveness of the algorithm. This study explores a new path for the application of DETR - class algorithms in water surface target detection, indicating their potential in this field.

FullText(HTML)

References (28)

Cited By

Detection of water surface targets based on improved Deformable DETR

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content