Detection of Laryngeal Pathologies from Voice using EMD-based Mel-Spectrograms and Scalograms with AlexNet

Sofiane Cherif; Abdelhafid Kaddour; Abdelmoudjib Benkada; Said Karoui

doi:10.2478/msr-2025-0030

.blurhash-client-img { display: none !important; }

Detection of Laryngeal Pathologies from Voice using EMD-based Mel-Spectrograms and Scalograms with AlexNet

Measurement Science Review

Volume 25 (2025): Issue 5 (October 2025)

By: Sofiane Cherif , Abdelhafid Kaddour , Abdelmoudjib Benkada and Said Karoui

Open Access

|Oct 2025

Figures & Tables

Synoptic representation of the proposed method: voice signals first undergo signal processing, followed by EMD to extract IMFs. The most energetic IMF is selected to compute Mel-spectrograms and scalograms, which are then fed into a pretrained AlexNet-CNN model for voice pathology classification.

Voice signals after pre-processing, including noise filtering, amplitude normalization, and segmentation into fixed-length frames. These enhanced signals serve as input for subsequent EMD-based decomposition and feature extraction (e.g., MFCCs and scalograms) to distinguish pathological from healthy voice patterns.

The voice signal is decomposed into IMFs using EMD. Each IMF represents a distinct oscillatory mode, ordered from high to low frequency content, capturing features relevant to voice characteristics. The most energetic IMF (highlighted) is selected for further analysis, including MFCC-based and scalogram image generation, to support the classification between pathological and healthy voice signals.

Example of a Mel spectrogram generated from the most energetic IMF of a pre-processed voice signal. The representation emphasizes perceptually meaningful spectral patterns used to discriminate between healthy and pathological voices in classification tasks.

Scalogram example derived from the most energetic IMF of a pre-processed voice signal using CWT. The representation highlights relevant time-frequency patterns for the subsequent classification of healthy and pathological voices.

Confusion matrix showing the performance of AlexNet-CNN on scalograms of the most energetic IMFs obtained with EMD and CWT.

Comparison of our method with recent studies on pathological voice detection_

Study	Dataset	Features and model	Accuracy [%]
[29]	SVD	Multipeak, Gaussian mixture model (GMM)	91.83
[30]	SVD + HUPA	MFCCs, SVM	71.45–76.19
[31]	MEEI voice disorders	MFCC (500 ms frames, 5 ms shift), SVM	66.4–75.1
[32]	SVD + HUPA	wav2vec, SVM	68.55–83.11
[33]	SVD + HUPA	Mel-spectrogram, SVM	69.45–75
[34]	VOICED	wav2vec 2.0, SVM / KNN	98
[35]	UA-speech + TORGO	MFCCs, SVM	63.13–89.22

This work	SVD	EMD-IMF, Mel-spectrogram + scalogram, AlexNet-CNN	85.66 / 86.4

References

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.2478/msr-2025-0030 | Journal eISSN: 1335-8871

Journal RSS Feed

Language: English

Page range: 276 - 283

Submitted on: Oct 21, 2024

Accepted on: Aug 28, 2025

Published on: Oct 15, 2025

Published by: Slovak Academy of Sciences, Institute of Measurement Science

In partnership with: Paradigm Publishing Services

Publication frequency: Volume open

Keywords:

laryngeal pathology detection,

voice signal processing,

empirical mode decomposition,

Mel-spectrogram,

scalogram,

AlexNet convolutional neural network

Related subjects:

Engineering,

Electrical engineering,

Control engineering, metrology and testing

© 2025 Sofiane Cherif, Abdelhafid Kaddour, Abdelmoudjib Benkada, Said Karoui, published by Slovak Academy of Sciences, Institute of Measurement Science
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Volume 25 (2025): Issue 5 (October 2025)

Detection of Laryngeal Pathologies from Voice using EMD-based Mel-Spectrograms and Scalograms with AlexNet

Figures & Tables

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

Comparison of our method with recent studies on pathological voice detection_

Paradigm

My account