Heterogeneous Multi-Branch Feature Fusion Architecture for Underwater Acoustic Target Recognition

Yuexiong Yang; Zhenhao Chu; Zilong Peng

doi:10.2478/pomr-2026-0028

.blurhash-client-img { display: none !important; }

Heterogeneous Multi-Branch Feature Fusion Architecture for Underwater Acoustic Target Recognition

Polish Maritime Research

Volume 33 (2026): Issue 2 (June 2026)

By: Yuexiong Yang , Zhenhao Chu and Zilong Peng

Open Access

|May 2026

Abstract

The task of underwater acoustic target recognition has significant value for both military and civilian applications involving complex marine environments. However, traditional methods often struggle to achieve robust performance due to strong noise interference, weak target signals, and insufficient complementarity among the different features. This paper therefore proposes a deep learning recognition framework based on multimodal spectrogram features, in which short-time Fourier transform magnitude spectra, mel-spectrograms, and continuous wavelet transform spectrograms are used as multi-branch inputs, and a heterogeneous convolutional neural network is employed to extract time-frequency feature representations from each modality. Tailored multi-scale convolutional kernels are designed according to the characteristics of the different features, thereby effectively capturing the energy textures, envelope characteristics, and multi-scale structural information. At the fusion stage, a structure-dependent fusion strategy selection method is introduced, which enables the optimal integration of multi-branch features through inverse-variance weighting. Experimental results demonstrate that the proposed method achieves an accuracy of 97.60% in underwater acoustic target recognition tasks, significantly outperforming single-modality and homogeneous multi-branch models, a finding that validates the effectiveness of the proposed heterogeneous multi-branch architecture and its adaptive fusion strategy.

References

Thorp W H. Analytic description of the low‐frequency attenuation coefficient. Journal of the Acoustical Society of America 1967. https://doi.org/10.1121/1.1910566.
Search in Google Scholar Back to article
Luo X, Chen L, Zhou H, Cao H. A survey of underwater acoustic target recognition methods based on machine learning. Journal of Marine Science and Engineering 2023. https://doi.org/10.3390/jmse11020384.
Search in Google Scholar Back to article
Sun Q, Zhou H. An acoustic sea glider for deep-sea noise profiling using an acoustic vector sensor. Polish Maritime Research 2022. https://doi.org/https://doi.org/10.2478/pomr-2022-0006.
Search in Google Scholar Back to article
Zieja M, Wawrzyński W, Tomaszewska J, Sigiel N. A method for the interpretation of sonar data recorded during autonomous underwater vehicle missions. Polish Maritime Research 2022. https://doi.org/https://doi.org/10.2478/pomr-2022-0038.
Search in Google Scholar Back to article
Zhang C, Xu Q, Yang H, Peng Z, Li J, Zhou J. Experimental study and numerical simulation of radiated noise from unmanned underwater vehicle. Polish Maritime Research 2024. https://doi.org/https://doi.org/10.2478/pomr-2024-0057.
Search in Google Scholar Back to article
Chen Y, Xu X. The research of underwater target recognition method based on deep learning. Proc. 2017 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), 2017. https://doi.org/10.1109/ICSPCC.2017.8242464.
Search in Google Scholar Back to article
Rabiner L R. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 2002. https://doi.org/10.1109/5.18626.
Search in Google Scholar Back to article
Wang Q, Zeng X, Wang L, Wang H, Cai H. Passive moving target classification via spectra multiplication method. IEEE Signal Processing Letters 2017. https://doi.org/10.1109/LSP.2017.2672601.
Search in Google Scholar Back to article
Gent C, Sheppard C. Special feature: Predicting time series by a fully connected neural network trained by back propagation. Computing and Control Engineering 1992. https://doi.org/10.1049/cce:19920031.
Search in Google Scholar Back to article
Hinton G E, Osindero S, Teh Y-W. A fast learning algorithm for deep belief nets. Neural Computation 2006. https://doi.org/10.1162/neco.2006.18.7.1527.
Search in Google Scholar Back to article
Lian Z, Xu K, Wan J, Li G. Underwater acoustic target classification based on modified GFCC features. Proc. 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), 2017. https://doi.org/10.1109/IAEAC.2017.8054017.
Search in Google Scholar Back to article
Wang X, Zhang Y, Xiao Z, Huang M. IS3L: An integrated self-training semi-supervised learning strategy for underwater acoustic target detection. Applied Acoustics 2023. https://doi.org/10.1016/j.apacoust.2023.109477.
Search in Google Scholar Back to article
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015. https://doi.org/10.1038/nature14539.
Search in Google Scholar Back to article
Piczak K J. Environmental sound classification with convolutional neural networks. Proc. 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), 2015. https://doi.org/10.1109/MLSP.2015.7324337.
Search in Google Scholar Back to article
Honghui Y, Junhao L, Meiping S. Underwater acoustic target multi-attribute correlation perception method based on deep learning. Applied Acoustics 2022. https://doi.org/10.1016/j.apacoust.2022.108644.
Search in Google Scholar Back to article
Shen S, Yang H, Li J, Xu G, Sheng M. Auditory inspired convolutional neural networks for ship type classification with raw hydrophone data. Entropy 2018. https://doi.org/10.3390/e20120990.
Search in Google Scholar Back to article
Han Y, Kim J, Lee K. Deep convolutional neural networks for predominant instrument recognition in polyphonic music. IEEE/ACM Transactions on Audio, Speech, and Language Processing 2016. https://doi.org/10.1109/TASLP.2016.2632307.
Search in Google Scholar Back to article
Zhang Q, Da L, Zhang Y, Hu Y. Integrated neural networks based on feature fusion for underwater target recognition. Applied Acoustics 2021. https://doi.org/10.1016/j.apacoust.2021.108261.
Search in Google Scholar Back to article
Pan X, Sun J, Feng T, Lei M, Wang H, Zhang W. Underwater target recognition based on adaptive multi-feature fusion network. Multimedia Tools and Applications 2025. https://doi.org/10.1007/s11042-024-19178-9.
Search in Google Scholar Back to article
Seo S, Kim C, Kim J-H. Convolutional neural networks using log mel-spectrogram separation for audio event classification with unknown devices. Journal of Web Engineering 2022. https://doi.org/https://doi.org/10.13052/jwe1540-9589.21216.
Search in Google Scholar Back to article
Khalilabadi M R. Underwater ship-radiated acoustic noise recognition based on mel-spectrogram and convolutional neural network. International Journal of Coastal, Offshore and Environmental Engineering (IJCOE) 2023. https://doi.org/https://doi.org/10.22034/ijcoe.2023.166732.
Search in Google Scholar Back to article
Sareen V, Seeja K. Speech emotion recognition using mel spectrogram and convolutional neural networks (CNN). Procedia Computer Science 2025. https://doi.org/https://doi.org/10.1016/j.procs.2025.04.624.
Search in Google Scholar Back to article
Huzaifah M. Comparison of time-frequency representations for environmental sound classification using convolutional neural networks. arXiv preprint arXiv:1706.07156 2017. https://doi.org/10.48550/arXiv.1706.07156.
Search in Google Scholar Back to article
Bi F, Yang L. Research on acoustic scene classification based on time–frequency–wavelet fusion network. Sensors 2025. https://doi.org/10.3390/s25133930.
Search in Google Scholar Back to article
Meng X et al. A multi-time-frequency feature fusion approach for marine mammal sound recognition. Journal of Marine Science and Engineering 2025. https://doi.org/10.3390/jmse13061101.
Search in Google Scholar Back to article
Pham L, Phan H, Nguyen T, Palaniappan R, Mertins A, McLoughlin I. Robust acoustic scene classification using a multi-spectrogram encoder-decoder framework. Digital Signal Processing 2021. https://doi.org/10.1016/j.dsp.2020.102943.
Search in Google Scholar Back to article
Zheng W, Mo Z, Xing X, Zhao G. CNNs-based acoustic scene classification using multi-spectrogram fusion and label expansions. arXiv preprint arXiv:1809.01543 2018. https://doi.org/10.48550/arXiv.1809.01543.
Search in Google Scholar Back to article
Irfan M, Jiangbin Z, Ali S, Iqbal M, Masood Z, Hamid U. DeepShip: An underwater acoustic benchmark dataset and a separable convolution based autoencoder for classification. Expert Systems with Applications 2021. https://doi.org/10.1016/j.eswa.2021.115270.
Search in Google Scholar Back to article
Zhu P, Zhang Y, Huang Y, Zhao C, Zhao K, Zhou F. Underwater acoustic target recognition based on spectrum component analysis of ship radiated noise. Applied Acoustics 2023. https://doi.org/10.1016/j.apacoust.2023.109552.
Search in Google Scholar Back to article

Articles in this issue

DOI: https://doi.org/10.2478/pomr-2026-0028 | Journal eISSN: 2083-7429 | Journal ISSN: 1233-2585

Journal RSS Feed

Language: English

Page range: 145 - 155

Published on: May 6, 2026

Published by: Gdansk University of Technology

In partnership with: Paradigm Publishing Services

Keywords:

underwater acoustic target recognition,

multimodal fusion,

convolutional neural network (CNN),

heterogeneous architecture,

feature fusion

Related subjects:

Engineering,

Introductions and overviews,

Engineering, other,

Geosciences,

Atmospheric science and climatology,

Life sciences,

Life sciences, other

© 2026 Yuexiong Yang, Zhenhao Chu, Zilong Peng, published by Gdansk University of Technology
This work is licensed under the Creative Commons Attribution 4.0 License.

Volume 33 (2026): Issue 2 (June 2026)