BiCAG-Net for Marine Mammal Acoustic Signal Recognition Oriented to Vessel Environmental Perception
-
Abstract
Objectives Vessel navigation needs to balance environmental perception with ecological safety; therefore, vocalizing biological targets such as marine mammals should be identified in a timely manner to support collision-avoidance warning, sensitive habitat avoidance, and the implementation of eco-friendly maritime missions. Methods To address the problems that local spectral textures of marine mammal acoustic signals are easily masked under ship-radiated noise, that a single representation can hardly characterize both harmonic structures and temporal evolution, and that existing multi-representation methods lack sufficient deep interaction, a recognition method based on the Bidirectional Cross-Attention and Gated Fusion Network, namely BiCAG-Net, is proposed. The proposed method constructs dual time-frequency representations using log-Mel spectrograms and the constant-Q transform (CQT), and employs a ResNet18 branch and a Convolution-augmented Transformer (Conformer) branch to extract local spectral texture features and harmonic temporal features, respectively; bidirectional cross-attention is then used to achieve deep cross-branch information interaction, and a gated fusion decision module is introduced to adaptively weight the branch outputs and fused output, thereby improving classification robustness under ship-noise conditions. Results Experimental results show that the proposed method achieves a macro-averaged F1 score and an accuracy of 96.88% and 97.96%, respectively, on the test set at 20 dB; compared with the baseline methods, it obtains the highest Macro-F1 under four signal-to-noise ratio (SNR) conditions, namely 20, 10, 0, and −10 dB. Conclusions The proposed method can provide technical support for marine mammal acoustic signal recognition and risk early warning in complex marine environments encountered by vessels.
-
-