References
- Hu Y, Liu Y, Lv S, et al. DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement [J]. Arxiv preprint arxiv: 2008.00264, 2020.
- Wang K, He B. Zhu W P. TSTNN: Two-stage transformer based neural network for speech enhancement in the time domain[C]//ICASSP 2021-2021 IEEE international Conference on acoustics, speech and signal processing (ICASSP). IEEE, 2021: 7098-7102.
- Reddy C K A, Gopal V, Cutler R, et al. The interspeech 2020 deep noise suppression challenge: Datasets, subjective testing framework, and challenge results [J]. Arxiv preprint arxiv: 2005.13981, 2020.
- Fedorov I, Stamenovic M, Jensen C, et al. TinyLSTMs: Efficient neural speech enhancement for hearing aids [J]. Arxiv preprint arxiv: 2005.11138, 2020.
- Zhang H, Cisse M, Dauphin Y N, et al. mixup: Beyond empirical risk minimization [J]. Arxiv preprint arxiv: 1710.09412, 2017.
- Valin J M. A hybrid DSP/deep learning approach to real-time full-band speech enhancement[C]//2018 IEEE 20th international workshop on multimedia signal processing (MMSP). IEEE, 2018: 1-5.
- Dubey H, Aazami A, Gopal V, et al. Icassp 2023 deep noise suppression challenge [J]. IEEE Open Journal of Signal Processing, 2024, 5: 725-737.
- Graves A, Fernández S, Gomez F, et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks[C]//Proceedings of the 23rd international conference on Machine learning. 2006: 369-376.
- Baevski A, Zhou Y, Mohamed A, et al. wav2vec 2.0: A framework for self-supervised learning of speech representations [J]. Advances in neural information processing systems, 2020, 33: 12449-12460.
- Elman J L. Finding structure in time [J]. Cognitive science, 1990, 14(2): 179-211.
- Pascanu R, Mikolov T, Bengio Y. On the difficulty of training recurrent neural networks[C]//International conference on machine learning. Pmlr, 2013: 1310-1318.
- Hochreiter S. Untersuchungen zu dynamischen neuronalen Netzen [J]. Diploma, Technische Universität München, 1991, 91(1): 31.
- Graves A. Long short-term memory [J]. Supervised sequence labelling with recurrent neural networks, 2012: 37-45.
- Gers F A, Schmidhuber J, Cummins F. Learning to forget: Continual prediction with LSTM [J]. Neural computation, 2000, 12(10): 2451-2471.
- Sak H, Senior A W, Beaufays F. Long short-term memory recurrent neural network architectures for large scale acoustic modeling[C]//Interspeech. 2014, 2014: 338-342.
- Bottou L. Large-scale machine learning with stochastic gradient descent[C]//Proceedings of COMPSTAT’2010: 19th International Conference on Computational StatisticsParis France, August 22-27, 2010 Keynote, Invited and Contributed Papers. Heidelberg: Physica-Verlag HD, 2010: 177-186.
- Seide F, Fu H, Droppo J, et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs[C]//Interspeech. 2014, 2014: 1058-1062.
- Reddy C K A, Gopal V, Cutler R, et al. The interspeech 2020 deep noise suppression challenge: Datasets, subjective testing framework, and challenge results [J]. Arxiv preprint arxiv: 2005.13981, 2020.
- Veaux C, Yamagishi J, MacDonald K. CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit [J]. University of Edinburgh. The Centre for Speech Technology Research (CSTR), 2017, 6: 15.
- Morris A C, Maier V, Green P D. From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition[C]//Interspeech. 2004: 2765-2768.
- Wen X, Li W. Time series prediction based on LSTM attention-LSTM model [J]. IEEE access, 2023, 11: 48322-48331.
- Liu X. Deep convolutional and LSTM neural networks for acoustic modelling in automatic speech recognition [J]. 2018.
- Kitza M, Golik P, Schlüter R, et al. Cumulative adaptation for BLSTM acoustic models [J]. Arxiv preprint arxiv: 1906.06207, 2019.
- Zeyer A, Irie K, Schlüter R, et al. Improved training of end-to-end attention models for speech recognition [J]. Arxiv preprint arxiv: 1805.03294, 2018.