Have a personal or library account? Click to login
Detection of Laryngeal Pathologies from Voice using EMD-based Mel-Spectrograms and Scalograms with AlexNet Cover

Detection of Laryngeal Pathologies from Voice using EMD-based Mel-Spectrograms and Scalograms with AlexNet

Open Access
|Oct 2025

References

  1. Dhillon, V. K. (2022). Vocal Cord Disorders. https://www.hopkinsmedicine.org/health/conditions-and-diseases/vocal-cord-disorders. (Accessed September 2025).
  2. Verdolini, K., Ramig, L. O. (2001). Review: Occupational risks for voice problems. Logopedics, Phoniatrics, Vocology, 26 (1), 37–46.
  3. Parsa, V., Jamieson, D. G. (2000). Identification of pathological voices using glottal noise measures. Journal of Speech, Language, and Hearing Research, 43 (2), 469–485. https://doi.org/10.1044/jslhr.4302.469.
  4. Wang, J., Xu, H., Peng, X., Liu, J., He, C. (2023). Pathological voice detection based on multi-domain features and deep hierarchical extreme learning machine. The Journal of the Acoustical Society of America, 153 (1), 423–435. https://doi.org/10.1121/10.0016869.
  5. AL-Dhief, F. T., Latiff, N. M. A. A., Malik, N. N. N. A., Sabri, N., Albadr, M. A. A., Abbas, A. F., Hussein, Y. M., Mohammed, M. A. (2020). Voice pathology detection using machine learning technique. In 2020 IEEE 5th International Symposium on Telecommunication Technologies (ISTT). IEEE, 99–104. https://doi.org/10.1109/ISTT50966.2020.9279346.
  6. Al-Nasheri, A., Muhammad, G., Alsulaiman, M., Ali, Z., Mesallam, T. A., Farahat, M., Malki, K. H., Bencherif, M. A. (2017)a. An investigation of multidimensional voice program parameters in three different databases for voice pathology detection and classification. Journal of Voice, 31 (1), 113.e9–113.e18. https://doi.org/10.1016/j.jvoice.2016.03.019.
  7. Al-Nasheri, A., Muhammad, G., Alsulaiman, M., Ali, Z. (2017)b. Investigation of voice pathology detection and classification on different frequency regions using correlation functions. Journal of Voice, 31 (1), 3–15. https://doi.org/10.1016/j.jvoice.2016.01.014.
  8. Godino-Llorente, J. I., Gomez-Vilda, P. (2004). Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors. IEEE Transactions on Biomedical Engineering, 51 (2), 380–384. https://doi.org/10.1109/TBME.2003.820386.
  9. Mittal., V., Sharma, R. K. (2021). Deep learning approach for voice pathology detection and classification. International Journal of Healthcare Information Systems and Informatics, 16 (4), 1–30. https://doi.org/10.4018/IJHISI.20211001.oa28.
  10. Roohum, J., Jayagowri, R. (2020). Voice disorder detection and classification - a review. In Proceedings of the 2nd International Conference on IoT, Social, Mobile, Analytics and Cloud in Computational Vision and Bio-Engineering (ISMAC-CVB 2020). https://doi.org/10.2139/ssrn.3734762.
  11. Altayeb, M., Al-Ghraibah., A. (2022). Classification of three pathological voices based on specific features groups using support vector machine. International Journal of Electrical and Computer Engineering (IJECE), 12 (1), 946–956. https://doi.org/http://doi.org/10.11591/ijece.v12i1.pp946-956.
  12. Hammami, I. (2019). Classification of psychogenic and laryngeal voice diseases based on wavelet transform analysis and teager energy operator. International Journal of Applied Mathematics, Electronics and Computers, 7 (3), 49–55. https://doi.org/10.18100/ijamec.458230.
  13. Wu, H., Soraghan, J., Lowit, A., Di Caterina, G. (2018). Convolutional neural networks for pathological voice detection. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 1–4. https://doi.org/10.1109/EMBC.2018.8513222.
  14. Dibazar, A. A., Narayanan, S., Berger., T. W. (2002). Feature analysis for automatic detection of pathological speech. In Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, Vol. 1. 182–183. https://doi.org/10.1109/IEMBS.2002.1134447.
  15. Arjmandi, M. K., Pooyan., M. (2012). An optimum algorithm in pathological voice quality assessment using wavelet-packet-based features, linear discriminant analysis and support vector machine. Biomedical Signal Processing and Control, 7 (1), 3–19.
  16. Hariharan, M., Polat, K., Yaacob, S. (2014). A new feature constituting approach to detection of vocal fold pathology. International Journal of Systems Science, 45 (8), 1622–1634. https://doi.org/10.1080/00207721.2013.794905.
  17. Saldanha, J. C., Ananthakrishna, T., Pinto, R. (2014). Vocal fold pathology assessment using mel-frequency cepstral coefficients and linear predictive cepstral coefficients features. Journal of Medical Imaging and Health Informatics, 4 (2), 168–173. https://doi.org/10.1166/jmihi.2014.1253.
  18. Arias-Londoño, J. D., Godino-Llorente, J. I., Sáenz-Lechón, N., Osma-Ruiz, V., Castellanos-Domínguez, G. (2011). Automatic detection of pathological voices using complexity measures, noise parameters, and melcepstral coefficients. IEEE Transactions on Biomedical Engineering, 58 (2), 370–379. https://doi.org/10.1109/TBME.2010.2089052.
  19. Godino-Llorente, J. I., Aguilera-Navarro, S., Gómez-Vilda, P. (2000). LPC, LPCC and MFCC parameterisation applied to the detection of voice impairments. In 6th International Conference on Spoken Language Processing (ICSLP 2000). ISCA, Vol. 3. 965–968. https://doi.org/10.21437/ICSLP.2000-695.
  20. Watts, C. R., Awan., S. N. (2011). Use of spectral/cepstral analyses for differentiating normal from hypofunctional voices in sustained vowel and continuous speech contexts. Journal of Speech, Language, and Hearing Research, 54 (6), 1525–1537. https://doi.org/10.1044/1092-4388(2011/10-0209).
  21. Farazi, S., Shekofteh., Y. (2024). Voice pathology detection on spontaneous speech data using deep learning models. International Journal of Speech Technology, 27, 739–751. https://doi.org/10.1007/s10772-024-10134-4.
  22. Mohammed, M. A., Abdulkareem, K. H., Mostafa, S. A., Ghani, M. K. A., Maashi, M. S., Garcia-Zapirain, B., Oleagordia, I., Alhakami, H., AL-Dhief, F. T. (2020). Voice pathology detection and classification using convolutional neural network model. Applied Sciences, 10 (11), 3723. https://doi.org/10.3390/app10113723.
  23. Ankışhan, H., İnam, S. Ç. (2021). Voice pathology detection by using the deep network architecture. Applied Soft Computing 106, 107310. https://doi.org/10.1016/j.asoc.2021.107310.
  24. Barry, W. J. (2000). Saarbrücken Voice Database, Version 2.0. Institute of Phonetics, Saarland University, Germany. https://stimmdb.coli.uni-saarland.de/.
  25. Krizhevsky, A., Sutskever, I., Hinton., G. E. (2017). ImageNet classification with deep convolutional networks. Communications of the ACM, 60 (6), 84–90. https://doi.org/10.1145/3065386.
  26. Huang, N. E., Shen, Z., Long, S. R., Wu, M. C., Shih, H. H., Zheng, Q., Yen, N.-C., Tung, C. C., Liu, H. H. (1998). The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 454 (1971), 903–995. https://doi.org/10.1098/rspa.1998.0193.
  27. Huang, N. E., Wu, M.-L. C., Long, S. R., Shen, S. S. P., Qu, W., Gloersen, P., Fan, K. L. (2003). A confidence limit for the empirical mode decomposition and hilbert spectral analysis. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 459, 2317–2345. https://doi.org/10.1098/rspa.2003.1123.
  28. Wu, Z., Huang, N. E. (2009). Ensemble empirical mode decomposition: A noise-assisted data analysis method. Advances in Adaptive Data Analysis, 1 (1), 1–41. https://doi.org/10.1142/S1793536909000047.
  29. Eskidere, O., Gürhanlı, A. (2015). Voice Disorder Classification Based on Multitaper Mel Frequency Cepstral Coefficients Features. Computational and Mathematical Methods in Medicine 1–12. https://doi.org/10.1155/2015/956249.
  30. Kadiri, S. R., Alku, P. (2020). Analysis and detection of pathological voice using glottal source features. IEEE Journal of Selected Topics in Signal Processing, 14 (2), 367–379. https://doi.org/10.1109/JSTSP.2019.2957988.
  31. Tirronen, S., Kadiri, S. R., Alku, P. (2022). The Effect of the MFCC Frame Length in Automatic Voice Pathology Detection. Journal of Voice, 38 (5), 975–982. https://doi.org/10.1016/j.jvoice.2022.03.021.
  32. Tirronen, S., Javanmardi, F., Kodali, M., Kadiri, S. R., Alku, P. (2023). Utilizing Wav2vec in Database-Independent Voice Disorder Detection. In 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 1–5. https://doi.org/10.1109/ICASSP49357.2023.10094798.
  33. Javanmardi, F., Kadiri, S. R., Alku, P. (2023). A comparison of data augmentation methods in voice pathology detection. Computer Speech and Language, 83, 101552. https://doi.org/10.1016/j.csl.2023.101552.
  34. Cai, J., Song, Y., Wu, J. (2024). Voice disorder classification using Wav2vec 2.0 feature extraction. Journal of Voice. https://doi.org/10.1016/j.jvoice.2024.09.002.
  35. Javanmardi, F., Tirronen, S., Kodali, M., Kadiri, S. R., Alku, P. (2023). Wav2vec-based detection and severity level classification of dysarthria from speech. In 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/DOI:10.1109/ICASSP49357.2023.100948577.
Language: English
Page range: 276 - 283
Submitted on: Oct 21, 2024
|
Accepted on: Aug 28, 2025
|
Published on: Oct 15, 2025
In partnership with: Paradigm Publishing Services
Publication frequency: Volume open

© 2025 Sofiane Cherif, Abdelhafid Kaddour, Abdelmoudjib Benkada, Said Karoui, published by Slovak Academy of Sciences, Institute of Measurement Science
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.