Performance analysis of speech enhancement using spectral gating with U-Net

Jharna Agrawal; Manish Gupta; Hitendra Garg

doi:10.2478/jee-2023-0044

References

Y. Masuyama, M. Togami and T. Komatsu, “Consistency-aware multi-channel speech enhancement using deep neural networks”, Proceedings 2020 IEEE International Acoustics, Speech and Signal Processing Conference (ICASSP), pp. 821-825, 2020. DOI: 10.1109/ICASSP40776.2020.9053501
Search in Google Scholar Back to article
P. C. Loizou, Speech enhancement: theory and practice, 1st ed. Boca Raton: CRC press, pp. 1-10, 2007.
Search in Google Scholar Back to article
S. Gannot, E. Vincent, S. Markovich-Golan and A. Ozerov, “A consolidated perspective on multi microphone speech enhancement and source separation”, IEEE/ACM Trans. on Audio, Speech, and Language Processing, vol. 25, no. 4, pp. 692-730, 2017. DOI: 10.1109/TASLP.2016.2647702
Search in Google Scholar Back to article
C. Rascon, “Characterization of Deep Learning-Based Speech-Enhancement Techniques in Online Audio Processing Applications”, Sensors, vol. 23, no. 9, p. 4394, 2023. DOI: https://doi.org/10.3390/s23094394
Search in Google Scholar Back to article
H. Garg, B. Sharma, S. Shekhar and R. Agarwal, “Spoofing detection system for e-health digital twin using Efficient Net Convolution Neural Network”, Multimedia Tools and Applications, vol. 81, no. 16, pp. 26873-26888, 2022. DOI: https://doi.org/10.1007/s11042-021-11578-5
Search in Google Scholar Back to article
D. Agarwal and A. Bansal, “Fingerprint liveness detection through fusion of pores perspiration and texture features”, J. King Saud University-Computer and Information Sciences, vol. 34, no. 7, pp. 4089-4098, 2020. DOI: https://doi.org/10.1016/j.jksuci.2020.10.003
Search in Google Scholar Back to article
G. Gosztolya and T. Grósz, “Domain adaptation of deep neural networks for automatic speech recognition via wireless sensors”, Journal of Electrical Engineering, vol. 67, no. 2, pp. 124-130, 2016. DOI: https://doi.org/10.1007/s11042-022-13056-y
Search in Google Scholar Back to article
S. Shekhar, D. K. Sharma, M. M. Sufyan Beg, “Hindi Roman linguistic framework for retrieving transliteration variants using bootstrapping”, Procedia Computer Science, vol. 125, pp. 59-67, 2018. DOI: 10.1016/j.procs.2017.12.010
Search in Google Scholar Back to article
R. Martinek, M. Kelnar, J. Vanus, P. Bilik and J. Zidek, “A robust approach for acoustic noise suppression in speech using ANFIS”, Journal of electrical engineering, vol. 66, no. 6, pp. 301-310, 2015. DOI: https://doi.org/10.2478/jee-2015-0050
Search in Google Scholar Back to article
Y. Tsao and Y. H. Lai, “Generalized maximum a posteriori spectral amplitude estimation for speech enhancement”, Speech Communication, vol. 76, pp. 112-126, 2016. DOI: https://doi.org/10.1016/j.specom.2015.10.003
Search in Google Scholar Back to article
J. Cheng, R. Liang and L. Zhao, “DNN-based speech enhancement with self-attention on feature dimension”, Multimedia Tools and Applications, vol. 79, pp. 32449-32470, 2020. DOI: https://doi.org/10.1007/s11042-020-09345-z
Search in Google Scholar Back to article
S. Boll, “Suppression of acoustic noise in speech using spectral subtraction”, IEEE Trans. on acoustics, speech, and signal processing, vol. 27, no. 2, pp. 113-120, 1979. DOI: 10.1109/TASSP.1979.1163209
Search in Google Scholar Back to article
P. Scalart, “Speech enhancement based on a priori signal to noise estimation”, Proceedings 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 629-632, 1996. DOI: 10.1109/ICASSP.1996.543199
Search in Google Scholar Back to article
Y. Ephraim and D. Malah, “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator”, IEEE Trans. on acoustics, speech, and signal processing, Vol. 32, no. 6, pp. 1109-1121, 1984. DOI: 10.1109/TASSP.1984.1164453
Search in Google Scholar Back to article
C. Lan, Y. Wang, L. Zhang, C. Liu and X. Lin, “Research on Speech Enhancement Algorithm of Multiresolution Cochleagram Based on Skip Connection Deep Neural Network”, Sensors, vol. 2022, 2022. DOI: https://doi.org/10.1155/2022/5208372
Search in Google Scholar Back to article
Z. Kang, Z. Huang and C. Lu, “Speech Enhancement Using U-Net with Compressed Sensing”, App. Sciences, vol. 12, no. 9, p. 4161, 2022. DOI: https://doi.org/10.3390/app12094161
Search in Google Scholar Back to article
O. Ronneberger, P. Fischer and T. Brox, “U-net: Convolutional networks for biomedical image segmentation”, Proceedings 2015 International Conference on Medical image computing and computer-assisted intervention, (Springer Cham.), pp. 234-241, 2015. DOI: https://doi.org/10.1007/978-3-319-24574-4_28
Search in Google Scholar Back to article
C. Geng and L. Wang, “End-to-end speech enhancement based on discrete cosine transform”, Proceedings 2020 IEEE International Artificial Intelligence and Computer Applications Conf. (ICAICA), pp. 379-383, 2020. DOI: 10.1109/ICAICA50127.2020.9182513
Search in Google Scholar Back to article
D. Stoller, S. Ewert and S. Dixon S, “Wave-unet: A multi-scale neural network for end-to-end audio source separation”, arXiv preprint arXiv:1806.03185, 2018. DOI: https://doi.org/10.48550/arXiv.1806.03185
Search in Google Scholar Back to article
C. Macartney and T. Weyde, “Improved speech enhancement with the wave-u-net”, arXiv preprint arXiv:1811.11307, 2018. DOI: https://doi.org/10.48550/arXiv.1811.11307
Search in Google Scholar Back to article
B. Widrow, J. R. Glover, J. M. McCool, J. Kaunitz, C. S. Williams, R. H. Hearn and R. C. Goodlin, “Adaptive noise cancelling: Principles and applications”, Proceedings of the IEEE, vol. 63, no. 12, pp. 1692-1716, 1975. DOI: 10.1109/PROC.1975.10036
Search in Google Scholar Back to article
M. Ravanelli, T. Parcollet, P. Plantinga, A. Rouhe, S. Cornell, L. Lugosch, and Y. Bengio, “SpeechBrain: A general-purpose speech toolkit”, arXiv preprint arXiv:2106.04624, 2021. DOI: https://doi.org/10.48550/arXiv.2106.04624
Search in Google Scholar Back to article
V. Panayotov, G. Chen, D. Povey and S. Khudanpur, “Librispeech: an asr corpus based on public domain audio books”, Proceedings IEEE International Acoustics, Speech and Signal Processing Conference (ICASSP), pp. 5206-5210, 2015. DOI: 10.1109/ICASSP.2015.7178964
Search in Google Scholar Back to article
P. Loizou and Y. Hu, “NOIZEUS: A noisy speech corpus for evaluation of speech enhancement algorithms”, Speech Communication vol. 49, pp. 588-601, 2007. DOI: 10.1016/j.specom.2006.12.006
Search in Google Scholar Back to article
I. T. Recommendation, “Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs”, Rec. ITU-T. P. 862, 2001.
Search in Google Scholar Back to article
M. Al-Akhras, K. Daqrouq and A. R. Al-Qawasmi, “Perceptual evaluation of speech enhancement,” In 2010 7th International Multi-Conference on Systems, Signals and Devices, pp. 1-6, IEEE, 2010. DOI: 10.1109/SSD.2010.5585514
Search in Google Scholar Back to article
M. Kolbaek, Z. H. Tan and J. Jensen, “On the relationship between short-time objective intelligibility and short-time spectral-amplitude mean-square error for speech enhancement”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 2, pp. 283-295, 2018. DOI: 10.1109/TASLP.2018.2877909
Search in Google Scholar Back to article
R. Giri, U. Isik and A. Krishnaswamy, “Attention wave-u-net for speech enhancement”, IEEE Workshop 2019 Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 249-253, 2019. DOI: 10.1109/WASPAA.2019.8937186
Search in Google Scholar Back to article

Performance analysis of speech enhancement using spectral gating with U-Net

References

Paradigm

My account