Citation: | ZHANG X Y, YANG Y G, ZHANG B L, et al. Visual attribute prompt learning method for fine-grained ship recognition[J]. Chinese Journal of Ship Research (in Chinese). DOI: 10.19693/j.issn.1673-3185.04407. |
This study proposes a Visual Attribute Prompt Learning (VAPT) mechanism to address challenges such as strong interference, limited data, and inadequate modeling of semantic features in deep learning-based ship image recognition tasks.
The framework constructs a large-scale pre-trained visual attribute codebook and incorporates a Multi-head Cross-Attention Mechanism (MCA) for attribute matching and selection, enabling effective alignment with deep visual models to enhance their ability to recognize critical ship features.
The proposed method is validated on a meticulously annotated custom ship image dataset, achieving a notable 3.79% improvement in Top-1 accuracy over the baseline Vision Transformer (ViT). On the public FGSCR-42 dataset, the proposed method achieves Top-1 accuracy gains of 2.20% and 0.10% over the state-of-the-art B-CNN and RA-CNN methods, respectively.
The research provides a novel technical solution for feature decoupling and knowledge transfer in target recognition tasks under complex marine conditions, offering significant implications for intelligent maritime monitoring systems.
[1] |
郑志明, 吕金虎, 韦卫, 等. 精准智能理论: 面向复杂动态对象的人工智能[J]. 中国科学: 信息科学, 2021, 51(4): 678–690. doi: 10.1360/SSI-2020-0158
ZHENG Z M, LÜ J H, WEI W, et al. Refined intelligence theory: artificial intelligence regarding complex dynamic objects[J]. Scientia Sinica Informationis, 2021, 51(4): 678–690 (in Chinese). doi: 10.1360/SSI-2020-0158
|
[2] |
蒋正锋. 基于深度学习网络的舰船识别方法[J]. 舰船科学技术, 2021, 43(6): 88–90. doi: 10.3404/j.issn.1672-7649.2021.3A.030
JIANG Z F. Ship recognition based on deep learning network[J]. Ship Science and Technology, 2021, 43(6): 88–90 (in Chinese). doi: 10.3404/j.issn.1672-7649.2021.3A.030
|
[3] |
VAN MEERBEECK A, VAN LANDEGHEM J, CARTUYVELS R, et al. Optimizing ship detection efficiency in SAR images[EB/OL]. (2022-12-12)[2023-12-31]. https://arxiv.org/pdf/2212.05843.pdf.
|
[4] |
严荣慧, 谢海成, 花敏恒, 等. 基于运动与表象特征的广域船舶目标识别方法[J]. 中国舰船研究, 2022, 17(1): 227–234. doi: 10.19693/j.issn.1673-3185.02320
YAN R H, XIE H C, HUA M H, et al. Wide-area ship target recognition method based on motion and appearance features[J]. Chinese Journal of Ship Research, 2022, 17(1): 227–234 (in Chinese). doi: 10.19693/j.issn.1673-3185.02320
|
[5] |
严荣慧, 郭前, 雷鸣, 等. 基于特征融合及混合注意力的小目标船舶识别[J]. 中国舰船研究, 2024, 19(6): 284–292. doi: 10.19693/j.issn.1673-3185.03489
YAN R H, GUO Q, LEI M, et al. Research on small target ship recognition based on feature fusion method and hybrid attention model[J]. Chinese Journal of Ship Research, 2024, 19(6): 284–292 (in both Chinese and English). doi: 10.19693/j.issn.1673-3185.03489
|
[6] |
张志龙, 杨卫平, 张焱, 等. 基于频谱残留变换的红外遥感图像舰船目标检测方法[J]. 电子与信息学报, 2015, 37(9): 2144–2150. doi: 10.11999/JEIT141659
ZHANG Z L, YANG W P, ZHANG Y, et al. Ship detection in infrared remote sensing images based on spectral residual transform[J]. Journal of Electronics & Information Technology, 2015, 37(9): 2144–2150 (in Chinese). doi: 10.11999/JEIT141659
|
[7] |
顾佼佼, 李炳臻, 刘克, 等. 基于改进Faster R-CNN的红外舰船目标检测算法[J]. 红外技术, 2021, 43(2): 170–178.
GU J J, LI B Z, LIU K, et al. Infrared ship target detection algorithm based on improved faster R-CNN[J]. Infrared Technology, 2021, 43(2): 170–178 (in Chinese).
|
[8] |
王浩臣, 辛月兰, 郭江, 等. 基于YOLOv5s的轻量化遥感舰船检测算法[J]. 中国舰船研究, 2024, 19(5): 200–207. doi: 10.19693/j.issn.1673-3185.03454
WANG H C, XIN Y L, GUO J, et al. Lightweight remote sensing ship detection algorithm based on YOLOv5s[J]. Chinese Journal of Ship Research, 2024, 19(5): 200–207 (in Chinese). doi: 10.19693/j.issn.1673-3185.03454
|
[9] |
MA N N, ZHANG X Y, ZHENG H T, et al. ShuffleNet V2: practical guidelines for efficient CNN architecture design[C]//Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018: 116−131. doi: 10.1007/978-3-030-01264-9_8.
|
[10] |
姜孟超, 范灵毓, 李硕豪. 基于注意力双线性池化的细粒度舰船识别[J]. 计算机技术与发展, 2022, 32(8): 66–70. doi: 10.3969/j.issn.1673-629X.2022.08.011
JIANG M C, FAN L Y, LI S H. Weakly supervised fine-grained natural scene ship recognition via attention bilinear pooling[J]. Computer Technology and Development, 2022, 32(8): 66–70 (in Chinese). doi: 10.3969/j.issn.1673-629X.2022.08.011
|
[11] |
赵春晖, 肖舒匀, 宿南. 基于注意力机制特征增强的舰船目标识别[J]. 黑龙江大学工程学报, 2022, 13(1): 43–51. doi: 10.13524/j.2095-008x.2022.01.008
ZHAO C H, XIAO S Y, SU N. Ship recognition based on feature enhancement using attention mechanism[J]. Journal of Engineering of Heilongjiang University, 2022, 13(1): 43–51 (in Chinese). doi: 10.13524/j.2095-008x.2022.01.008
|
[12] |
苏丽, 崔世豪. 基于注意力机制的改进自校准图像增强算法及其在海上低照度场景的应用[J]. 中国舰船研究, 2025, 20(3): 339–348. doi: 10.19693/j.issn.1673-3185.03833
SU L, CUI S H. Improved self-calibration image enhancement algorithm based on attention mechanism and its application in maritime low-light images[J]. Chinese Journal of Ship Research, 2025, 20(3): 339–348 (in Chinese). doi: 10.19693/j.issn.1673-3185.03833
|
[13] |
ZHAO Y F, LI J, CHEN X W, et al. Part-guided relational transformers for fine-grained visual recognition[J]. IEEE Transactions on Image Processing, 2021, 30: 9470–9481. doi: 10.1109/TIP.2021.3126490
|
[14] |
LIN T Y, ROYCHOWDHURY A, MAJI S. Bilinear CNN models for fine-grained visual recognition[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1449−1457. doi: 10.1109/ICCV.2015.170.
|
[15] |
ZHANG N, DONAHUE J, GIRSHICK R, et al. Part-based R-CNNs for fine-grained category detection[C]//Proceedings of the 13th European Conference on Computer Vision. Zurich: Springer, 2014: 834-849. doi: 10.1007/978-3-319-10590-1_54.
|
[16] |
YANG Y D, WANG X F, ZHAO Q, et al. Two-level attentions and grouping attention convolutional network for fine-grained image classification[J]. Applied Sciences, 2019, 9(9): 1939. doi: 10.3390/app9091939
|
[17] |
FU J L, ZHENG H L, MEI T. Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017: 4476−4484. doi: 10.1109/CVPR.2017.476.
|
[18] |
ZHENG H L, FU J L, ZHA Z J, et al. Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019: 5007−5016. doi: 10.1109/CVPR.2019.00515.
|
[19] |
YANG Z, LUO T G, WANG D, et al. Learning to navigate for fine-grained classification[C]//Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018: 438−454. doi: 10.1007/978-3-030-01264-9_26.
|
[20] |
RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//Proceedings of the 38th International Conference on Machine Learning. [S.l.]: PMLR, 2021: 8748−8763.
|
[21] |
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[C]//9th International Conference on Learning Representations. [S.l.]: ICLR, 2021: 1−21.
|
[22] |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc. , 2017: 6000−6010.
|
[23] |
LAMPERT C H, NICKISCH H, HARMELING S. Learning to detect unseen object classes by between-class attribute transfer[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami: IEEE, 2009: 951−958. doi: 10.1109/CVPR.2009.5206594.
|
[24] |
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770−778. doi: 10.1109/CVPR.2016.90.
|
[25] |
BAO H B, DONG L, PIAO S H, et al. BEiT: BERT pre-training of image transformers[C]//10th International Conference on Learning Representations. [S.l.]: ICLR, 2022: 1−18.
|
[26] |
DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis: Association for Computational Linguistics, 2019: 4171−4186. doi: 10.18653/v1/N19-1423.
|
[27] |
PENG Z L, DONG L, BAO H B, et al. BEiT v2: masked image modeling with vector-quantized visual tokenizers[EB/OL]. (2022-10-03)[2023-12-31]. https://arxiv.org/pdf/2208.06366.pdf.
|
[28] |
LIU P F, YUAN W Z, FU J L, et al. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing[J]. ACM Computing Surveys, 2023, 55(9): 195. doi: 10.1145/3560815
|
[29] |
ZHOU K Y, YANG J K, LOY C C, et al. Learning to prompt for vision-language models[J]. International Journal of Computer Vision, 2022, 130(9): 2337–2348. doi: 10.1007/s11263-022-01653-1
|
[30] |
ZHOU K Y, YANG J K, LOY C C, et al. Conditional prompt learning for vision-language models[C]//Proceedings of the 2022 IEEE/CVF conference on computer vision and pattern recognition. New Orleans: IEEE, 2022: 16795−16804. doi: 10.1109/CVPR52688.2022.01631.
|
[31] |
ZHU B E, NIU Y L, HAN Y C, et al. Prompt-aligned gradient for prompt tuning[C]//Proceedings of the IEEE/CVF international conference on computer vision (ICCV). Paris: IEEE, 2023: 15613−15623. doi: 10.1109/ICCV51070.2023.01435.
|
[32] |
DI Y H, JIANG Z G, ZHANG H P. A public dataset for fine-grained ship classification in optical remote sensing images[J]. Remote Sensing, 2021, 13(4): 747. doi: 10.3390/rs13040747
|
[33] |
RAMESH A, PAVLOV M, GOH G, et al. Zero-shot text-to-image generation[C]//Proceedings of the 38th International Conference on Machine Learning. [S.l.]: PMLR, 2021: 8821−8831.
|