Heterogeneous Multi-Branch Feature Fusion Architecture for Underwater Acoustic Target Recognition
Abstract
The task of underwater acoustic target recognition has significant value for both military and civilian applications involving complex marine environments. However, traditional methods often struggle to achieve robust performance due to strong noise interference, weak target signals, and insufficient complementarity among the different features. This paper therefore proposes a deep learning recognition framework based on multimodal spectrogram features, in which short-time Fourier transform magnitude spectra, mel-spectrograms, and continuous wavelet transform spectrograms are used as multi-branch inputs, and a heterogeneous convolutional neural network is employed to extract time-frequency feature representations from each modality. Tailored multi-scale convolutional kernels are designed according to the characteristics of the different features, thereby effectively capturing the energy textures, envelope characteristics, and multi-scale structural information. At the fusion stage, a structure-dependent fusion strategy selection method is introduced, which enables the optimal integration of multi-branch features through inverse-variance weighting. Experimental results demonstrate that the proposed method achieves an accuracy of 97.60% in underwater acoustic target recognition tasks, significantly outperforming single-modality and homogeneous multi-branch models, a finding that validates the effectiveness of the proposed heterogeneous multi-branch architecture and its adaptive fusion strategy.
© 2026 Yuexiong Yang, Zhenhao Chu, Zilong Peng, published by Gdansk University of Technology
This work is licensed under the Creative Commons Attribution 4.0 License.