Have a personal or library account? Click to login
Detection of Laryngeal Pathologies from Voice using EMD-based Mel-Spectrograms and Scalograms with AlexNet Cover

Detection of Laryngeal Pathologies from Voice using EMD-based Mel-Spectrograms and Scalograms with AlexNet

Open Access
|Oct 2025

Figures & Tables

Fig. 1.

Synoptic representation of the proposed method: voice signals first undergo signal processing, followed by EMD to extract IMFs. The most energetic IMF is selected to compute Mel-spectrograms and scalograms, which are then fed into a pretrained AlexNet-CNN model for voice pathology classification.
Synoptic representation of the proposed method: voice signals first undergo signal processing, followed by EMD to extract IMFs. The most energetic IMF is selected to compute Mel-spectrograms and scalograms, which are then fed into a pretrained AlexNet-CNN model for voice pathology classification.

Fig. 2.

Voice signals after pre-processing, including noise filtering, amplitude normalization, and segmentation into fixed-length frames. These enhanced signals serve as input for subsequent EMD-based decomposition and feature extraction (e.g., MFCCs and scalograms) to distinguish pathological from healthy voice patterns.
Voice signals after pre-processing, including noise filtering, amplitude normalization, and segmentation into fixed-length frames. These enhanced signals serve as input for subsequent EMD-based decomposition and feature extraction (e.g., MFCCs and scalograms) to distinguish pathological from healthy voice patterns.

Fig. 3.

The voice signal is decomposed into IMFs using EMD. Each IMF represents a distinct oscillatory mode, ordered from high to low frequency content, capturing features relevant to voice characteristics. The most energetic IMF (highlighted) is selected for further analysis, including MFCC-based and scalogram image generation, to support the classification between pathological and healthy voice signals.
The voice signal is decomposed into IMFs using EMD. Each IMF represents a distinct oscillatory mode, ordered from high to low frequency content, capturing features relevant to voice characteristics. The most energetic IMF (highlighted) is selected for further analysis, including MFCC-based and scalogram image generation, to support the classification between pathological and healthy voice signals.

Fig. 4.

Example of a Mel spectrogram generated from the most energetic IMF of a pre-processed voice signal. The representation emphasizes perceptually meaningful spectral patterns used to discriminate between healthy and pathological voices in classification tasks.
Example of a Mel spectrogram generated from the most energetic IMF of a pre-processed voice signal. The representation emphasizes perceptually meaningful spectral patterns used to discriminate between healthy and pathological voices in classification tasks.

Fig. 5.

Scalogram example derived from the most energetic IMF of a pre-processed voice signal using CWT. The representation highlights relevant time-frequency patterns for the subsequent classification of healthy and pathological voices.
Scalogram example derived from the most energetic IMF of a pre-processed voice signal using CWT. The representation highlights relevant time-frequency patterns for the subsequent classification of healthy and pathological voices.

Fig. 6.

Confusion matrix showing the performance of AlexNet-CNN on scalograms of the most energetic IMFs obtained with EMD and CWT.
Confusion matrix showing the performance of AlexNet-CNN on scalograms of the most energetic IMFs obtained with EMD and CWT.

Comparison of our method with recent studies on pathological voice detection_

StudyDatasetFeatures and modelAccuracy [%]
[29]SVDMultipeak, Gaussian mixture model (GMM)91.83
[30]SVD + HUPAMFCCs, SVM71.45–76.19
[31]MEEI voice disordersMFCC (500 ms frames, 5 ms shift), SVM66.4–75.1
[32]SVD + HUPAwav2vec, SVM68.55–83.11
[33]SVD + HUPAMel-spectrogram, SVM69.45–75
[34]VOICEDwav2vec 2.0, SVM / KNN98
[35]UA-speech + TORGOMFCCs, SVM63.13–89.22

This workSVDEMD-IMF, Mel-spectrogram + scalogram, AlexNet-CNN85.66 / 86.4
Language: English
Page range: 276 - 283
Submitted on: Oct 21, 2024
|
Accepted on: Aug 28, 2025
|
Published on: Oct 15, 2025
In partnership with: Paradigm Publishing Services
Publication frequency: Volume open

© 2025 Sofiane Cherif, Abdelhafid Kaddour, Abdelmoudjib Benkada, Said Karoui, published by Slovak Academy of Sciences, Institute of Measurement Science
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.