Have a personal or library account? Click to login
An Intelligent Framework for Person Identification Using Voice Recognition and Audio Data Classification Cover

An Intelligent Framework for Person Identification Using Voice Recognition and Audio Data Classification

Open Access
|Jan 2023

References

  1. [1] D.S. Park, W. Chan, Y. Zhang, C.C. Chiu, B. Zoph, E.D. Cubuk, and Q.V. Le, “Specaugment: A simple data augmentation method for automatic speech recognition,” 2019. arXiv preprint arXiv:1904.08779. https://doi.org/10.48550/arXiv.1904.08779
  2. [2] T. Fukuda, O. Ichikawa, and M. Nishimura, “Detecting breathing sounds in realistic Japanese telephone conversations and its application to automatic speech recognition,” Speech Communication, vol. 98, pp. 95–103, Apr. 2018. https://doi.org/10.1016/j.specom.2018.01.008
  3. [3] M. Wickert, “Real-time digital signal processing using pyaudio\_helper and the ipywidgets,” in Proceedings of the 17th Python in Science Conference, Austin, TX, USA, Jul. 2018, pp. 9–15. https://doi.org/10.25080/Majora-4af1f417-00e
  4. [4] A. Srivastava and S. Maheshwari, “Signal denoising and multiresolution analysis by discrete wavelet transform,” Innovative Trends in Applied Physical, Chemical, Mathematical Sciences and Emerging Energy Technology for Sustainable Development, 2015.
  5. [5] J. P. Dron and F. Bolaers, “Improvement of the sensitivity of the scalar indicators (crest factor, kurtosis) using a de-noising method by spectral subtraction: application to the detection of defects in ball bearings,” Journal of Sound and Vibration, vol. 270, no. 1–2, pp. 61–73, Feb. 2004. https://doi.org/10.1016/S0022-460X(03)00483-8
  6. [6] E. Eban, A. Jansen, and S. Chaudhuri, “Filtering wind noises in video content,” U.S. Patent Application 15/826 622, March 22, 2018.
  7. [7] B.B. Ali, W. Wojcik, O. Mamyrbayev, M. Turdalyuly, and N. Mekebayev, “Speech recognizer -based non-uniform spectral compression for robust MFCC feature extraction,” Przeglad Elektrotechniczny, vol. 94, no. 6, pp.90–93, Jun. 2018. https://doi.org/10.15199/48.2018.06.17
  8. [8] Ç.P. Dautov and M.S. Özerdem, “Wavelet transform and signal denoising using Wavelet method,” in 2018 26th Signal Processing and Communications Applications Conference (SIU), Izmir, Turkey, May 2018, pp. 1–4. https://doi.org/10.1109/SIU.2018.8404418
  9. [9] R. Liu, L.O. Hall, K.W. Bowyer, D.B. Goldgof, R. Gatenby, and K.B. Ahmed, “Synthetic minority image over-sampling technique: How to improve AUC for glioblastoma patient survival prediction,” in 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada, Oct. 2017, pp. 1357–1362. https://doi.org/10.1109/SMC.2017.8122802
  10. [10] R. Lotfian and C. Busso, “Over-sampling emotional speech data based on subjective evaluations provided by multiple individuals,” IEEE Transactions on Affective Computing, vol. 12, no. 4, pp. 870–882, Feb. 2019. https://doi.org/10.1109/TAFFC.2019.2901465
  11. [11] Khan, I., Ullah, A. and Emad, S.M., “Robust Feature Extraction Techniques in Speech Recognition: A Comparative Analysis” in 2019 KIET Journal of Computing and Information Sciences, 2(2), pp.11-11.
  12. [12] E. Mulyanto, E.M. Yuniarno, and M.H. Purnomo, “Adding an emotions filter to Javanese text -to-speech system,” in 2018 International Conference on Computer Engineering, Network and Intelligent Multimedia (CENIM), Surabaya, Indonesia, Nov. 2018, pp. 142–146. https://doi.org/10.1109/CENIM.2018.8711229
  13. [13] H. Liao, G. Pundak, O. Siohan, M.K. Carroll, N. Coccaro, Q.M. Jiang, T.N. Sainath, A. Senior, F. Beaufays, and M. Bacchiani, “Large vocabulary automatic speech recognition for children,” in Sixteenth Annual Conference of the International Speech Communication Association, Dresden, Germany, Sep. 2015, pp. 1611–1615. https://doi.org/10.21437/Interspeech.2015-373
  14. [14] K.E. Kafoori and S.M. Ahadi, “Robust recognition of noisy speech through partial imputation of missing data,” Circuits, Systems, and Signal Processing, vol. 37, no. 4, pp. 1625–1648, Apr. 2018. https://doi.org/10.1007/s00034-017-0616-4
  15. [15] H.F.C. Chuctaya, R.N.M. Mercado, and J.J.G. Gaona, “Isolated automatic speech recognition of Quechua numbers using MFCC, DTW and KNN,” Int. J. Adv. Comput. Sci. Appl, vol. 9, no. 10, pp. 24–29, 2018. https://doi.org/10.14569/IJACSA.2018.091003
  16. [16] A. Winursito, R. Hidayat, A. Bejo, and M.N.Y. Utomo, “Feature data reduction of MFCC using PCA and SVD in speech recognition system,” in 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE), Shah Alam, Malaysia, Jul. 2018, pp. 1–6. https://doi.org/10.1109/ICSCEE.2018.8538414
  17. [17] L.N. Thu, A. Win, and H.N. Oo, “A review for reduction of noise by wavelet transform in audio signals,” International Research Journal of Engineering and Technology (IRJET), vol. 6, no. 5, May 2019.
  18. [18] Y. Luo and N. Mesgarani, “Tasnet: time-domain audio separation network for real-time, single-channel speech separation,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, Apr. 2018, pp. 696–700. https://doi.org/10.1109/ICASSP.2018.8462116
  19. [19] E. Ramentol, Y. Caballero, R. Bello, and F. Herrera, “SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory,” Knowledge and Information Systems, vol. 33, no. 2, pp. 245–265, Nov. 2012. https://doi.org/10.1007/s10115-011-0465-6
  20. [20] L. Abdi and S. Hashemi, “To combat multi-class imbalanced problems by means of over-sampling techniques,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 1, pp. 238–251, Jul. 2015. https://doi.org/10.1109/TKDE.2015.2458858
  21. [21] A.E. Martin, “A compositional neural architecture for language,” Journal of Cognitive Neuroscience, vol. 32, no. 8, pp. 1407–1427, Aug. 2020. https://doi.org/10.1162/jocn_a_0155232108553
  22. [22] S. Böck, F. Korzeniowski, J. Schlüter, F. Krebs, and G. Widmer, “Madmom: A new Python audio and music signal processing library,” in Proceedings of the 24th ACM International Conference on Multimedia, Oct. 2016, pp. 1174–1178. https://doi.org/10.1145/2964284.2973795
  23. [23] Z. Wang, “Baidu online network technology Beijing Co Ltd, Audio processing method and apparatus based on artificial intelligence,” U.S. Patent Application 10/192163, 2019.
  24. [24] J.P. Cunningham and Z. Ghahramani, “Linear dimensionality reduction: Survey, insights, and generalizations,” The Journal of Machine Learning Research, vol. 16, no. 1, pp. 2859–2900, 2015. chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://stat.columbia.edu/~cunningham/pdf/CunninghamJMLR2015.pdf
  25. [25] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp.436–444, May 2015. https://doi.org/10.1038/nature1453926017442
  26. [26] W. Chan, N. Jaitly, Q. Le, and O. Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, Mar. 2016, pp. 4960–4964. https://doi.org/10.1109/ICASSP.2016.7472621
  27. [27] J.K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio, “Attention-based models for speech recognition,” in Advances in Neural Information Processing Systems, Proceedings of the Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada, 2015, pp. 577–585. chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://proceedings.neurips.cc/paper/2015/file/1068c6e4c8051cfd4e9ea8072e3189e2-Paper.pdf
  28. [28] V.Z. Këpuska and H.A. Elharati, “Robust speech recognition system using conventional and hybrid features of MFCC, LPCC, PLP, RASTAPLP and hidden Markov model classifier in noisy conditions,” Journal of Computer and Communications, vol. 3, no. 6, pp. 1–9, Jun. 2015. https://doi.org/10.4236/jcc.2015.36001
  29. [29] Ç.P. Dautov and M.S. Özerdem, “Wavelet transform and signal denoising using Wavelet method,” in 2018 26th Signal Processing and Communications Applications Conference (SIU), Izmir, Turkey, May 2018, pp. 1–4. https://doi.org/10.1109/SIU.2018.8404418
  30. [30] N.V. Chawla, K.W. Bowyer, L.O. Hall, and W.P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002. https://doi.org/10.1613/jair.953
  31. [31] Z. Tüske, R. Schlüter, and H. Ney, “Acoustic modeling of speech waveform based on multi-resolution, neural network signal processing,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, Apr. 2018, pp. 4859–4863. https://doi.org/10.1109/ICASSP.2018.8461871
  32. [32] R. Shadiev, T.T. Wu, A. Sun, and Y.M. Huang, “Applications of speech-to-text recognition and computer-aided translation for facilitating cross-cultural learning through a learning activity: issues and their solutions,” Educational Technology Research and Development, vol. 66, no. 1, pp. 191–214, Feb. 2018. https://doi.org/10.1007/s11423-017-9556-8
DOI: https://doi.org/10.2478/acss-2022-0019 | Journal eISSN: 2255-8691 | Journal ISSN: 2255-8683
Language: English
Page range: 183 - 189
Published on: Jan 24, 2023
Published by: Riga Technical University
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2023 Isra Khan, Shah Muhammad Emaduddin, Ashhad Ullah, A Rafi Ullah, published by Riga Technical University
This work is licensed under the Creative Commons Attribution 4.0 License.