Detection of Laryngeal Pathologies from Voice using EMD-based Mel-Spectrograms and Scalograms with AlexNet

Sofiane Cherif; Abdelhafid Kaddour; Abdelmoudjib Benkada; Said Karoui

doi:10.2478/msr-2025-0030

.blurhash-client-img { display: none !important; }

Detection of Laryngeal Pathologies from Voice using EMD-based Mel-Spectrograms and Scalograms with AlexNet

Measurement Science Review

Volume 25 (2025): Issue 5 (October 2025)

By: Sofiane Cherif , Abdelhafid Kaddour , Abdelmoudjib Benkada and Said Karoui

Open Access

|Oct 2025

Dhillon, V. K. (2022). Vocal Cord Disorders. https://www.hopkinsmedicine.org/health/conditions-and-diseases/vocal-cord-disorders. (Accessed September 2025).
Search in Google Scholar Back to article
Verdolini, K., Ramig, L. O. (2001). Review: Occupational risks for voice problems. Logopedics, Phoniatrics, Vocology, 26 (1), 37–46.
Search in Google Scholar Back to article
Parsa, V., Jamieson, D. G. (2000). Identification of pathological voices using glottal noise measures. Journal of Speech, Language, and Hearing Research, 43 (2), 469–485. https://doi.org/10.1044/jslhr.4302.469.
Search in Google Scholar Back to article
Wang, J., Xu, H., Peng, X., Liu, J., He, C. (2023). Pathological voice detection based on multi-domain features and deep hierarchical extreme learning machine. The Journal of the Acoustical Society of America, 153 (1), 423–435. https://doi.org/10.1121/10.0016869.
Search in Google Scholar Back to article
AL-Dhief, F. T., Latiff, N. M. A. A., Malik, N. N. N. A., Sabri, N., Albadr, M. A. A., Abbas, A. F., Hussein, Y. M., Mohammed, M. A. (2020). Voice pathology detection using machine learning technique. In 2020 IEEE 5th International Symposium on Telecommunication Technologies (ISTT). IEEE, 99–104. https://doi.org/10.1109/ISTT50966.2020.9279346.
Search in Google Scholar Back to article
Al-Nasheri, A., Muhammad, G., Alsulaiman, M., Ali, Z., Mesallam, T. A., Farahat, M., Malki, K. H., Bencherif, M. A. (2017)a. An investigation of multidimensional voice program parameters in three different databases for voice pathology detection and classification. Journal of Voice, 31 (1), 113.e9–113.e18. https://doi.org/10.1016/j.jvoice.2016.03.019.
Search in Google Scholar Back to article
Al-Nasheri, A., Muhammad, G., Alsulaiman, M., Ali, Z. (2017)b. Investigation of voice pathology detection and classification on different frequency regions using correlation functions. Journal of Voice, 31 (1), 3–15. https://doi.org/10.1016/j.jvoice.2016.01.014.
Search in Google Scholar Back to article
Godino-Llorente, J. I., Gomez-Vilda, P. (2004). Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors. IEEE Transactions on Biomedical Engineering, 51 (2), 380–384. https://doi.org/10.1109/TBME.2003.820386.
Search in Google Scholar Back to article
Mittal., V., Sharma, R. K. (2021). Deep learning approach for voice pathology detection and classification. International Journal of Healthcare Information Systems and Informatics, 16 (4), 1–30. https://doi.org/10.4018/IJHISI.20211001.oa28.
Search in Google Scholar Back to article
Roohum, J., Jayagowri, R. (2020). Voice disorder detection and classification - a review. In Proceedings of the 2nd International Conference on IoT, Social, Mobile, Analytics and Cloud in Computational Vision and Bio-Engineering (ISMAC-CVB 2020). https://doi.org/10.2139/ssrn.3734762.
Search in Google Scholar Back to article
Altayeb, M., Al-Ghraibah., A. (2022). Classification of three pathological voices based on specific features groups using support vector machine. International Journal of Electrical and Computer Engineering (IJECE), 12 (1), 946–956. https://doi.org/http://doi.org/10.11591/ijece.v12i1.pp946-956.
Search in Google Scholar Back to article
Hammami, I. (2019). Classification of psychogenic and laryngeal voice diseases based on wavelet transform analysis and teager energy operator. International Journal of Applied Mathematics, Electronics and Computers, 7 (3), 49–55. https://doi.org/10.18100/ijamec.458230.
Search in Google Scholar Back to article
Wu, H., Soraghan, J., Lowit, A., Di Caterina, G. (2018). Convolutional neural networks for pathological voice detection. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 1–4. https://doi.org/10.1109/EMBC.2018.8513222.
Search in Google Scholar Back to article
Dibazar, A. A., Narayanan, S., Berger., T. W. (2002). Feature analysis for automatic detection of pathological speech. In Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, Vol. 1. 182–183. https://doi.org/10.1109/IEMBS.2002.1134447.
Search in Google Scholar Back to article
Arjmandi, M. K., Pooyan., M. (2012). An optimum algorithm in pathological voice quality assessment using wavelet-packet-based features, linear discriminant analysis and support vector machine. Biomedical Signal Processing and Control, 7 (1), 3–19.
Search in Google Scholar Back to article
Hariharan, M., Polat, K., Yaacob, S. (2014). A new feature constituting approach to detection of vocal fold pathology. International Journal of Systems Science, 45 (8), 1622–1634. https://doi.org/10.1080/00207721.2013.794905.
Search in Google Scholar Back to article
Saldanha, J. C., Ananthakrishna, T., Pinto, R. (2014). Vocal fold pathology assessment using mel-frequency cepstral coefficients and linear predictive cepstral coefficients features. Journal of Medical Imaging and Health Informatics, 4 (2), 168–173. https://doi.org/10.1166/jmihi.2014.1253.
Search in Google Scholar Back to article
Arias-Londoño, J. D., Godino-Llorente, J. I., Sáenz-Lechón, N., Osma-Ruiz, V., Castellanos-Domínguez, G. (2011). Automatic detection of pathological voices using complexity measures, noise parameters, and melcepstral coefficients. IEEE Transactions on Biomedical Engineering, 58 (2), 370–379. https://doi.org/10.1109/TBME.2010.2089052.
Search in Google Scholar Back to article
Godino-Llorente, J. I., Aguilera-Navarro, S., Gómez-Vilda, P. (2000). LPC, LPCC and MFCC parameterisation applied to the detection of voice impairments. In 6th International Conference on Spoken Language Processing (ICSLP 2000). ISCA, Vol. 3. 965–968. https://doi.org/10.21437/ICSLP.2000-695.
Search in Google Scholar Back to article
Watts, C. R., Awan., S. N. (2011). Use of spectral/cepstral analyses for differentiating normal from hypofunctional voices in sustained vowel and continuous speech contexts. Journal of Speech, Language, and Hearing Research, 54 (6), 1525–1537. https://doi.org/10.1044/1092-4388(2011/10-0209).
Search in Google Scholar Back to article
Farazi, S., Shekofteh., Y. (2024). Voice pathology detection on spontaneous speech data using deep learning models. International Journal of Speech Technology, 27, 739–751. https://doi.org/10.1007/s10772-024-10134-4.
Search in Google Scholar Back to article
Mohammed, M. A., Abdulkareem, K. H., Mostafa, S. A., Ghani, M. K. A., Maashi, M. S., Garcia-Zapirain, B., Oleagordia, I., Alhakami, H., AL-Dhief, F. T. (2020). Voice pathology detection and classification using convolutional neural network model. Applied Sciences, 10 (11), 3723. https://doi.org/10.3390/app10113723.
Search in Google Scholar Back to article
Ankışhan, H., İnam, S. Ç. (2021). Voice pathology detection by using the deep network architecture. Applied Soft Computing 106, 107310. https://doi.org/10.1016/j.asoc.2021.107310.
Search in Google Scholar Back to article
Barry, W. J. (2000). Saarbrücken Voice Database, Version 2.0. Institute of Phonetics, Saarland University, Germany. https://stimmdb.coli.uni-saarland.de/.
Search in Google Scholar Back to article
Krizhevsky, A., Sutskever, I., Hinton., G. E. (2017). ImageNet classification with deep convolutional networks. Communications of the ACM, 60 (6), 84–90. https://doi.org/10.1145/3065386.
Search in Google Scholar Back to article
Huang, N. E., Shen, Z., Long, S. R., Wu, M. C., Shih, H. H., Zheng, Q., Yen, N.-C., Tung, C. C., Liu, H. H. (1998). The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 454 (1971), 903–995. https://doi.org/10.1098/rspa.1998.0193.
Search in Google Scholar Back to article
Huang, N. E., Wu, M.-L. C., Long, S. R., Shen, S. S. P., Qu, W., Gloersen, P., Fan, K. L. (2003). A confidence limit for the empirical mode decomposition and hilbert spectral analysis. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 459, 2317–2345. https://doi.org/10.1098/rspa.2003.1123.
Search in Google Scholar Back to article
Wu, Z., Huang, N. E. (2009). Ensemble empirical mode decomposition: A noise-assisted data analysis method. Advances in Adaptive Data Analysis, 1 (1), 1–41. https://doi.org/10.1142/S1793536909000047.
Search in Google Scholar Back to article
Eskidere, O., Gürhanlı, A. (2015). Voice Disorder Classification Based on Multitaper Mel Frequency Cepstral Coefficients Features. Computational and Mathematical Methods in Medicine 1–12. https://doi.org/10.1155/2015/956249.
Search in Google Scholar Back to article
Kadiri, S. R., Alku, P. (2020). Analysis and detection of pathological voice using glottal source features. IEEE Journal of Selected Topics in Signal Processing, 14 (2), 367–379. https://doi.org/10.1109/JSTSP.2019.2957988.
Search in Google Scholar Back to article
Tirronen, S., Kadiri, S. R., Alku, P. (2022). The Effect of the MFCC Frame Length in Automatic Voice Pathology Detection. Journal of Voice, 38 (5), 975–982. https://doi.org/10.1016/j.jvoice.2022.03.021.
Search in Google Scholar Back to article
Tirronen, S., Javanmardi, F., Kodali, M., Kadiri, S. R., Alku, P. (2023). Utilizing Wav2vec in Database-Independent Voice Disorder Detection. In 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 1–5. https://doi.org/10.1109/ICASSP49357.2023.10094798.
Search in Google Scholar Back to article
Javanmardi, F., Kadiri, S. R., Alku, P. (2023). A comparison of data augmentation methods in voice pathology detection. Computer Speech and Language, 83, 101552. https://doi.org/10.1016/j.csl.2023.101552.
Search in Google Scholar Back to article
Cai, J., Song, Y., Wu, J. (2024). Voice disorder classification using Wav2vec 2.0 feature extraction. Journal of Voice. https://doi.org/10.1016/j.jvoice.2024.09.002.
Search in Google Scholar Back to article
Javanmardi, F., Tirronen, S., Kodali, M., Kadiri, S. R., Alku, P. (2023). Wav2vec-based detection and severity level classification of dysarthria from speech. In 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/DOI:10.1109/ICASSP49357.2023.100948577.
Search in Google Scholar Back to article

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.2478/msr-2025-0030 | Journal eISSN: 1335-8871

Journal RSS Feed

Language: English

Page range: 276 - 283

Submitted on: Oct 21, 2024

Accepted on: Aug 28, 2025

Published on: Oct 15, 2025

Published by: Slovak Academy of Sciences, Institute of Measurement Science

In partnership with: Paradigm Publishing Services

Publication frequency: Volume open

Keywords:

laryngeal pathology detection,

voice signal processing,

empirical mode decomposition,

Mel-spectrogram,

scalogram,

AlexNet convolutional neural network

Related subjects:

Engineering,

Electrical engineering,

Control engineering, metrology and testing

© 2025 Sofiane Cherif, Abdelhafid Kaddour, Abdelmoudjib Benkada, Said Karoui, published by Slovak Academy of Sciences, Institute of Measurement Science
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Volume 25 (2025): Issue 5 (October 2025)