Breast cancer is one of the most common cancers among women, and it accounts for more than 2 million new cases every year. It is important to detect breast cancer early to improve survival rates and treatment outcomes [1]. Mammography is the most widely adopted screening technique that provides detailed images of breast tissue to identify abnormalities such as lumps or microcalcifications [2, 3]. Still, the inherent complexities of breast tissues and imaging noises often remain a challenge for any diagnostic radiologist. Therefore, recent Computer Aided Diagnosis System (CAD) systems can be considered valuable tools in automated detection and classification systems used with breast masses so that the inherent human error in interpretation can be minimized, leading to more accurate and timely diagnoses [4,5,6].
Texture and geometric features are considered critical for breast cancer detection due to the ability to represent the structural characteristics of mammographic masses. Gabor filters are significant among texture analysis techniques because they capture frequency and orientation-specific information. Similarly, the gray-level co-occurrence matrix (GLCM) features provide statistical insight into the spatial relationships of the intensities of the pixels, which are significant for differentiating normal, benign, and malignant masses. The individual strength of Gabor and GLCM features notwithstanding their combined use for mammogram mass classification in the literature remains uninvestigated.
The research addresses the gap in mammogram classification by introducing a hybrid feature extraction method based on multi-scale multi-orientation (MSMO) Gabor and GLCM features. MSMO Gabor captures multi-scale and multi-orientation textural information, while GLCM quantifies statistical texture properties such as contrast, energy, correlation, and homogeneity. By combining these advanced texture and spatial feature extraction techniques, the proposed method enhances the discriminative ability of mammographic images, making it highly effective in identifying subtle abnormalities that may indicate early-stage malignancies. The resulting feature set is further optimized through feature selection to improve classifier performance, ensuring accurate differentiation of early-stage cancerous masses from benign and normal tissues. Experiments conducted on two benchmark datasets, MIAS and DDSM [7, 8], using various machine learning classifiers, including support vector machine (SVM), k-nearest neighbors (k-NN), decision tree (DT), random forest (RF), and deep neural network (DNN), validate the effectiveness of this approach, as demonstrated by the high-sensitivity values achieved.
The primary contributions of this study are as follows:
A novel hybrid feature extraction method combining MSMO Gabor and GLCM for robust mammogram mass classification.
Extensive evaluation of the proposed method on two widely used mammogram datasets.
Comparative analysis of multiple machine learning classifiers to identify the most effective approach for the task.
Insights into the role of feature selection in enhancing classification performance.
The results demonstrate that the proposed method outperforms existing techniques in classification accuracy, achieving state-of-the-art (SOTA) performance on the MIAS and DDSM datasets. This study aims to contribute to developing advanced CAD systems, aiding radiologists in early breast cancer detection and diagnosis. The remaining article is organized as follows: Section 2 reviews a literature survey of mammography-based approaches, Section 3 describes the proposed approach, Section 4 presents the experimental results, Section 5 discusses the achieved results, and Section 6 concludes with future directions.
Techniques used for texture feature extraction include wavelet transforms, Gabor filters, laws texture energy measurements, gray-level run length matrix (GLRLM), and GLCM [9]. GLCM calculates the co-occurrence of pixel intensities at different spatial connections, allowing for the analysis of subtle patterns in mammography images. Studies demonstrate that GLCM texture features are practical for breast cancer detection [10]. Similarly, GLRLM, which analyzes uniform run lengths of pixels, has shown reliable performance in distinguishing benign and malignant masses [2]. Laws texture energy measurements, employing defined filters to extract texture information, are also effective in differentiating tumors [3]. A novel technique called advanced gray-level co-occurrence matrix (AGLCM) is used from preprocessed images and classified the images using machine learning algorithms [11]. Khan et al. [12] used GLCM for classifying malignant cells in breast cytology images. Samuri et al. [13] compared the performance of three machine learning techniques: (1) Naïve Bayes (NB), (2) neural network (NN), and (3) SVM; and they used GLCM for feature extraction. Gabor filters utilize bandpass filters to extract orientation and frequency-specific features. They are handy for differentiating benign from malignant tumors [4]. Wavelet transforms analyze signals at multiple resolutions, enabling the detection of spiculated masses, often indicative of malignancy [5]. These methods excel in extracting texture and shape features, which are critical for accurate classification.
Fractal dimension analysis measures complex tumor shapes to differentiate between benign and malignant masses [14]. Texture features, including fractal dimensions, are very discriminative in comparative studies [2, 3]. Multi-Fractal Dimension (M-FD) [4] and compactness-based shape analysis [15] are also used for further classification enhancement. The combination of local and global features has also improved the accuracy of classification [6].
Hybrid approaches, including local seed region growing (LSRG) with spherical wavelet transform (SWT), have shown promising potential in automated mass detection [16, 17]. Using LSRG for region identification and SWT for texture extraction, such methods achieve the best classification results [18,19,20].
Integrating 2D discrete wavelet transform (DWT) with GLCM has been particularly effective. DWT extracts multi-resolution texture features, while GLCM quantifies their spatial relationships. This combination has achieved high classification accuracy in mammogram studies [21, 22]. Modified Gabor functions have also been employed to enhance feature extraction [23]. An improved multi-fractal dimensional approach is proposed to derive the region of interest. Convolutional neural networks (CNNs) classification approach with Gabor transform for the classification of benign and malignant tumors has shown promising results by effectively capturing both spatial and texture features, thereby enhancing the accuracy of tumor differentiation [24, 25].
Advanced strategies include the use of wavelet neural networks (WNNs) with particle swarm optimization (PSO) [26] and hybrid methods integrating GLCM with multi-resolution transforms [27, 28]. Bureau et al. [21] demonstrated the effectiveness of combining statistical GLCM features with wavelet coefficients for mass classification. Multi-resolution methods, such as contourlet and shearlet transforms, have also been successfully utilized for mammogram analysis [29,30,31].
Directional textural characteristics extracted using Gabor filters remain a cornerstone of texture analysis, offering a robust representation of mammographic mass properties [32, 33]. Zernike moments, which capture shape content via orthogonal polynomials, have also been successfully applied for feature extraction [34, 35]. Researchers have combined Gabor and Zernike features, achieving high sensitivity and specificity in mass classification [36]. Lima et al. [37] further integrated Zernike moments with wavelet-derived multi-resolution features, using kernel-based classifiers like SVM and extreme learning machines (ELMs) to improve diagnostic accuracy. The Gabor filter [38] is used in conjunction with the k-means clustering method to extract the features of the Cranio Caudal (CC) and Mediolateral Oblique (MLO) views of mammograms. A model of a breast cancer prediction system with a novel machine-learning approach [39] based on wavelets to classify mammogram images as benign, malignant, and normal presents a systematic review Zebari et al. [60], a systematic description of all computing approaches for breast cancer detection.
Empirical mode decomposition (EMD), a data-driven technique for signal decomposition, has shown promise for feature extraction in automatic mass classification [37]. Gabor wavelets, combined with feature selection techniques such as PSO [40], enhance the extraction of patterns and textures at multiple angles and scales. These hybrid methods bridge the gap between traditional handcrafted feature extraction and automated deep learning approaches, ensuring both interpretability and high performance [41, 42].
Overall, hybrid techniques combining texture and shape features with advanced machine learning classifiers have significantly enhanced the accuracy and efficiency of mammogram mass classification. These methods leverage the strengths of various feature extraction techniques, demonstrating their potential for aiding radiologists in early breast cancer detection. Table 1 summarizes the above-surveyed approaches. Recent research in breast cancer detection emphasizes integrating traditional texture analysis methods like Gabor filters, GLCM, and Prewitt filters with modern deep learning techniques. These hybrid approaches aim to combine the interpretability of handcrafted features with CNNs efficiency. The highlights include:
Hybrid feature techniques: Studies combining GLCM and Gabor filters with CNNs have achieved notable performance in microcalcification detection, enhancing interpretability while maintaining robust classification accuracy and sensitivity. For example, Hernandez et al. [43] integrated traditional features with CNNs, achieving 89.56% accuracy and addressing dataset imbalances through data augmentation.
Lightweight architectures: Lightweight CNNs, such as those developed by Luna-Lozoya et al. [44], prioritize efficiency and achieve 99.30% accuracy and 95.00% sensitivity using the INbreast dataset. These architectures demonstrate the potential of hybrid approaches in resource-constrained settings.
Explainable AI (XAI): To address skepticism toward black-box models, researchers have adopted XAI frameworks. These systems align deep learning outputs with traditional features like GLCM and Gabor, enhancing clinical acceptance while maintaining diagnostic precision [43].
Data augmentation and unified databases: To mitigate issues like class imbalance, techniques such as scaling, rotation, and unified datasets sourced from diverse imaging modalities have been employed, improving model generalization and reducing diagnostic variability [43, 44].
Literature survey on texture feature extraction techniques
| Technique | Description and findings | References |
|---|---|---|
| GLCM | It calculates the co-occurrence of pixel intensities to analyze texture. Effective for detecting breast cancer in mammography images | [9, 45] |
| GLRLM | Focuses on uniform run lengths of pixels. Demonstrates accuracy in predicting benign vs malignant breast masses. | [2] |
| Laws texture energy | Employs filters to extract texture information. Distinguishes between benign and malignant breast tumors. | [3] |
| Gabor filters | Uses bandpass filters for extracting texture features. Effective in differentiating benign and malignant tumors. | [4, 23] |
| Wavelet transform | Analyzes signals at multiple scales. Useful for detecting spiculated masses in mammograms. | [5, 21, 46] |
| Fractal dimensions | Helps analyze tumor shape for classification of benign vs malignant. Multi-fractal dimensions improve classification accuracy. | [14, 15] |
| Hybrid techniques | Combines LSRG with SWTs for early breast cancer diagnosis. | [18, 47] |
| Zernike moments | Captures image shape content and extracts features for mammogram classification. Achieves high sensitivity and specificity. | [34, 35] |
| EMD | Decomposes signals to intrinsic mode functions for automatic classification. | [37] |
| ELM | Combines shape, texture, and edge features for accurate classification of breast masses. | [42] |
| Hybrid features with CNN | Combines Gabor, Prewitt, and GLCM features with CNN for microcalcification detection, leveraging ReLU activation and data augmentation for improved accuracy and sensitivity. Achieved 89.56% accuracy and 82.14% sensitivity. | [43] |
| Lightweight CNN architectures | Employs lightweight CNNs with minimal layers for efficient mammogram analysis. Notable results include 99.30% accuracy and 95.00% sensitivity with INbreast dataset. | [44] |
| XAI | Integrates Gabor and GLCM features with XAI frameworks, aligning automated detection with interpretable outcomes for clinical acceptance. | [43] |
| Unified databases with augmentation | Tackles dataset imbalances by creating unified databases and employing rotation/scaling augmentations to improve model generalization. | [43, 44] |
CNN, convolutional neural network; ELM, extreme learning machine; EMD, empirical mode decomposition; GLCM, gray-level co-occurrence matrix; GLRLM, gray-level run length matrix; LSRG, local seed region growing; SWT, spherical wavelet transform; XAI, explainable AI.
These developments underline the significance of hybrid feature-based approaches in advancing breast cancer diagnostics, offering reliable, interpretable, and efficient solutions for early detection.
We propose a hybrid feature extraction method that fuses texture and shape features derived from the mammogram mass, aiming to classify them as normal, benign, and malignant. The proposed method first applied the Gabor wavelet to the mammography image for the texture information feature, and the GLCM features provide statistical insights into the spatial relationships of pixel intensities, which are critical for distinguishing normal, benign, and malignant masses. Several classifiers were finally used for mass classification, of which RF performs on top. Combining the discriminative power of Gabor wavelets and the spatial-based features of GLCM is proposed to enhance mammogram mass classification accuracy. The steps for the suggested hybrid feature-based mammography mass classification using Gabor wavelet and GLCM feature are illustrated in Figure 1.

The proposed mass classification system’s flow diagram.
We enhance the mass region in the mammogram images using preprocess, contrast limited adaptive histogram equalization (CLAHE) [48], and postprocessing methods. First, noise and artifacts are eliminated from the mammogram images by preprocessing. The mammography images are processed using a median filter to lessen the impact of noise. The median filter is a nonlinear filter that substitutes the neighborhood median of each pixel for each one. The neighborhood’s dimensions are determined by looking at the size of the masses in the mammography images.
After preprocessing, CLAHE is applied to enhance the contrast of the mammogram images. CLAHE is a nonlinear technique that adjusts the intensity of each pixel based on the intensity distribution in its neighborhood. The contrast enhancement is limited by the maximum slope of the cumulative distribution function of the pixel values in each neighborhood that describes Algorithm 1.
Input: I (image) ∈ [n × m] pixels
Output: Updated [n × m] pixels
1: s ← adjacent pixels
2: for p ∈ I do
3: H ← compute histogram of p and ps
4: CDF ← cumulative distribution function of H
5: pnew ← mapping of p based on CDF
6: p ← pnew
7: end for
After CLAHE enhancement, the mammogram images are postprocessed to remove any remaining artifacts and noise. A Gaussian filter is applied to the mammogram images to remove any remaining noise. The size of the Gaussian filter is chosen based on the size of the masses in the mammogram images.
To extract the optimal feature for the mass classification, we apply the Gabor wavelet transform on the Region of Interest (ROI) to obtain the Gabor wavelet coefficients. This involves selecting appropriate values for the scale and orientation of the Gabor wavelet. The Gabor wavelet transform can be performed using Eq. (1):
Figure 2 shows the resultant images of how Gabor has worked on all three classes of mammogram mass images. The image demonstrates the feature extraction process for three classes of mammographic images—normal, benign, and malignant—using the MSMO Gabor filters. This process involves three key components for each class. First, the gray-scale images of each class (normal, benign, and malignant) serve as the initial input for feature extraction. Second, the magnitude of texture features is captured through the output of MSMO Gabor filters, which enhances the visualization of spatial frequency and orientation information.

Gabor responses on mammogram mass patches of various types of masses.
Input: I ← CLAHE-enhanced mammogram mass image
Output: X ← Feature vector
Notations: σ: Gabor wavelet parameters; ψ;: Orientations; t: Magnitude threshold
1: Define σ(scale) ← [1, 3, 5, 7, 9, 11, 13, 15] and ψ(orientation) ← [0, π/4, π/2, 3 * π/4];
2: Initialize X ← [ ]
3: for s ∈ σ do
4: fs = s × s
5: for o ∈ ψ do
6: f ← filterGabor_wavelet(fs, o)
7: f ← Convolve(I, f)
8:
\matrix{ m \leftarrow \sqrt {real{{({\rm{f}})}^2} + imaginary{{({\rm{f}})}^2};}\hfill \cr p \leftarrow {\rm{arctan\;}}\left( {{{real\left( {\rm{f}} \right)} \over {imaginary\left( {\rm{f}} \right)}}} \right) ;\hfill} 9: mt ← {m < = t}
10: Quantize p into N levels
11: g ← GLCM (mt, p, s, o)
12: X ← X + g
13: end for
14: end for
15: return X
This step is critical to uncovering the specific texture patterns of each class. Finally, the orientation maps from the Gabor filters are a source of directional information for the texture features. These maps give important spatial relations and orientation-specific details that can be inferred from the mammograms. Overall, Figure 2 shows the feature extraction by the MSMO Gabor filter that picks out the subtle difference in the normal, benign, and malignant classes, thus signifying the pattern-picking ability to enhance the classification correctness by the MSMO Gabor filter.
GLCM is used to extract statistical features of an image, like contrast, correlation, energy, and homogeneity.
Energy: The energy feature describes the sum of the square components in a GLCM that describes the overall texture intensity, which can be calculated using Eq. (2):
Contrast: The contrast feature measures the local variations in the texture and indicates how much the image contrasts. It can be calculated as shown in Eq. (3):
Homogeneity: The homogeneity feature measures the closeness of the GLCM values to the diagonal elements and represents the distribution of similar intensity values. It can be calculated as shown in Eq. (4):
Correlation: The correlation feature computes the linear relationship between intensity values in an image and states that patterns or trends exist. It is defined by Eq. (5):
Once the Gabor features are calculated, we can determine the discriminant power of each feature. Discriminant power will indicate the capacity of a feature to distinguish a benign mass from a malignant one. The discriminant power can be calculated using Eq. (6):
The classification stage involves the evaluation of various machine learning classifiers to determine the most effective approach for mammogram mass classification. Classifiers play a vital role in identifying and categorizing breast tissue as benign or malignant, aiding early diagnosis and improving treatment outcomes. Machine learning classifiers, trained on features like tumor size, texture, and shape, analyze data from imaging (e.g., mammograms) or biopsies.
A DT is a supervised learning algorithm used for classification and regression tasks. It models decisions as a tree-like structure, with internal nodes representing features, branches indicating decision rules, and leaf nodes denoting outcomes [49]. At each node, a feature is selected to split the data into subsets, aiming to maximize information gain or minimize impurity. The Gini index measures impurity in the dataset and is given by Eq. (7), where n is the number of classes and pi is the proportion of samples belonging to the i-th class.
This formula represents the probability of picking two samples from different classes, with a lower Gini index indicating a purer split, meaning subsets contain predominantly one class. On the contrary, entropy, grounded in information theory, measures the uncertainty in the dataset. It is computed as shown in Eq. (8), where pi is the proportion of samples in the i-th class.
Higher entropy corresponds to greater impurity, whereas lower entropy indicates more homogeneous splits. Both metrics aim to find the feature that produces the most homogeneous subsets, improving classification or regression accuracy. While the Gini index is computationally simpler and faster, entropy is more sensitive to class distribution differences, making it a theoretically robust choice in some cases.
RF is an ensemble learning method combining multiple DTs to improve classification or regression performance [50]. It creates n DTs, each trained on a bootstrapped dataset. Predictions are aggregated via majority voting (classification) or averaging (regression).
The equations describe how RF aggregates predictions from multiple DTs for classification and regression tasks. For classification (Eq. 9), each DT produces a class label prediction Yi(x) for the input x.
The final prediction â is determined by majority voting, selecting the class that appears most frequently among all n tree predictions.
In regression (Eq. 10), each tree predicts a numerical value Ti(x) for the input x, and the final prediction â is calculated as the average of these values across all n trees.
Aggregation methods ensure that the RF model produces robust and accurate predictions by leveraging the diversity of individual DTs while reducing the risk of overfitting.
SVM is a supervised machine learning algorithm used for classification and regression tasks [51]. It operates by identifying a hyperplane that best separates data points of different classes while maximizing the margin, which is the distance between the hyperplane and the nearest data points from each class. For a linearly separable dataset
For datasets that are not linearly separable, SVM introduces slack variables ξi to allow for some margin violations, leading to the soft-margin SVM formulation. In this case, the optimization goal becomes minimizing Eq. (12), where C is a hyperparameter that controls the trade-off between maximizing the margin and minimizing misclassification. The constraints are adjusted to yi(kTxi + q) ≥ 1 − ξi and ξi ≥ 0, allowing some data points to either fall within the margin (0 < ξi < 1) or be misclassified (ξi > 1).
For nonlinear data, SVM employs kernel functions to map the input features into a higher-dimensional space, where the data become linearly separable. A kernel function, such as the linear, polynomial, or radial basis function (RBF) kernel, computes the dot product in this transformed space without explicitly performing the transformation, significantly reducing computational complexity. This kernel trick enables SVM to efficiently handle complex, nonlinear classification problems.
k-NN is a nonparametric, lazy learning algorithm used for classification and regression. It predicts the output for a given input by finding the k closest data points (neighbors) in the feature space and using their labels or values. The closeness is typically measured using distance metrics like Euclidean, Manhattan, or Minkowski distance [52].
The Euclidean distance formula, given by Eq. (13), measures the straight-line distance between two points p1 and p2 in an n-dimensional feature space, where x1i and x2i are the coordinates of the points. Euclidean distance between two points p1 and p2 in ℝn:
This metric helps identify the k closest data points (neighbors) in the feature space. For classification, the predicted class label, q̂, is determined using the mode of the labels of the k nearest neighbors, expressed by Eq. (14), where qi is the label of the i-th neighbor and Mk represents the indices of the k nearest neighbors.
For regression, the predicted value is computed as the mean of the outputs of the k nearest neighbors, given by Eq. (15), where yi is the value of the i-th neighbor. These equations collectively define how k-NN utilizes proximity to make predictions.
Although not the main focus, DNNs [53] are included for comparison. They utilize multiple layers to learn complex patterns but require significant computational resources and large training datasets. The classifiers are evaluated with hyperparameter tuning and 10-fold cross-validation to ensure robust performance across datasets. Performance metrics such as accuracy, sensitivity, specificity, and computational efficiency are compared to identify the best-suited classifier for the proposed set of hybrid features.
In our experiments, we have used the following architecture as DNN model. The input layer is sized according to the dimensions of the images (224 × 224) pixels with three RGB channels. The images undergo preprocessing by resizing and normalizing. The model architecture includes convolutional layers for feature extraction, pooling layers for dimensionality reduction, and fully connected layers for classification. The model is compiled using the Adam optimizer and categorical cross-entropy loss, then trained on the data, and evaluated using a test set.
Three convolutional layers of filters 32, 64, and 128 are applied one after another to detect image features, using filter sizes of 3 × 3 or 5 × 5, with a stride of 1 or 2 and zeropadding to maintain spatial dimensions. The Rectified Linear Unit (ReLU) activation function is used to introduce nonlinearity. Max pooling is applied to reduce the image dimensions while preserving key features. Fully connected layers, with varying neuron counts (e.g., 512, 1,024, or 2,048), are used for classification, and the output layer typically has one or three neurons with softmax or sigmoid activation.
The Adam optimizer adjusts the learning rate, while the cross-entropy loss function is used for classification tasks. A learning rate of 0.001 or 0.0001 is typical, with training performed more than 10–100 epochs and batch sizes of 16–64.
The process followed in each experiment is given in Figure 3. The experimental process comprises three critical stages: data preprocessing, feature extraction, and classification. The data preprocessing step gets the raw data ready through tasks such as cleaning it to remove noise, normalizing the data to a standard range, and data augmentation through techniques like rotation, flipping, and scaling, making the model robust. The dataset is then split into training, validation, and test sets to allow for good evaluation. In the feature extraction phase, techniques include MSMO Gabor filters and GLCM. MSMO Gabor filters extract texture features with a multi-resolution space, capturing spatial frequency and orientation at varying scales and orientations, forming a detailed feature vector. By contrast, GLCM analyzes spatial relationships between pixel pairs to calculate specific statistical measures, including contrast, correlation, energy, and homogeneity. Finally, classification uses the extracted features to classify the data into predefined classes. It involves the selection of an appropriate algorithm, which could be SVMs, DTs, or NNs, training of the model to learn feature–target relationships, fine-tuning hyperparameters during validation to prevent overfitting, and testing the model on a separate dataset to check its accuracy and generalization capabilities. This structured methodology ensures a robust and reliable framework for effective classification.

Steps employed in experimentation.
The effectiveness of the proposed method is evaluated by conducting experiments on MIAS and Curated Breast Imaging Subset of DDSM (CBIS)-DDSM (referred to as DDSM throughout the article) mammogram mass datasets. There are three types of classes (normal, benign, and malignant) annotated for each mammogram in both databases. The abnormal mass region is taken from normal mammograms that do not have any suspicious region. The suspicious regions annotated in both datasets are annotated by expert radiologists that are either benign or malignant, which are here known as abnormal cases, and are cropped as mass regions and used for the experiment here. The dataset was obtained from the Heath et al. [54].
The MIAS dataset consists of digitized mammograms sourced from different hospitals. These images are annotated by experts in radiology, providing ground truth for abnormality detection and classification tasks. CBIS-DDSM is a curated breast imaging subset of DDSM mammography dataset, which is a widely used resource for breast cancer research and diagnosis. It contains a diverse collection of mammograms along with clinical information, annotations, and lesion segmentation. After preprocessing, the mammogram images are enhanced using the proposed method and compared with the original images.
The classification performance of mammogram mass feature extraction using GLCM statistical features from Gabor-transformed images is influenced by parameters such as sigma and orientation. σ determines the scale of filters, while orientation determines the direction of filters. Choosing appropriate values for sigma and orientation is important to capture the relevant texture information without losing important details or introducing noise. Experimentation and optimization are necessary to find the optimal parameter values for a given dataset, which leads to the improved classification rate.
The performance of proposed work is assessed using several classifiers such as DTs, RF, k-NN, and SVM. To analyze the categorization efficacy, the 10-fold cross-validation was used. Performance parameters such as specificity, sensitivity, and accuracy are computed. Here, accuracy is the proportion of correctly diagnosed samples. The proportion of correctly categorized malignant masses defines sensitivity. The proportion of correctly classified benign masses is known as specificity.
In this section, we present the results of our experimentation based on various perspectives, such as performance analysis based on various features, classifiers, comparison with SOTAs, and prediction analysis of the proposed approach (Table 2).
Normal, benign, and malignant case distribution in the MIAS and DDSM datasets
| Dataset | Category | Total images | Description | ||
|---|---|---|---|---|---|
| Normal | Benign | Malignant | |||
| MIAS | 209 | 62 | 51 | 322 | Contains digitized mammograms from various sources, annotated with abnormalities by expert radiologists. |
| DDSM | 141 | 771 | 784 | 1,696 | Comprehensive database with mammograms, clinical information, and annotations, primarily used for breast cancer research and diagnosis. |
The ablation study (Table 3) highlights the importance of individual components in the proposed mammogram classification pipeline. The results demonstrate that using only Gabor features yielded a classification accuracy of 88.00%, showcasing their effectiveness in capturing texture-related information. Similarly, GLCM features alone achieved an accuracy of 85.50%, indicating their strength in modeling spatial relationships. However, combining Gabor and GLCM features significantly enhanced performance, achieving a classification accuracy of 96.64%. This improvement validates the complementary nature of these features, as the hybrid approach effectively captures texture and spatial information crucial for mammogram classification. We observed that the combination of raw images with RF and DNN performs better than taking Gabor or GLCM as an input for these classifier (as shown in Table 3). The observed lower accuracy is attributed to the information loss that can occur during the feature selection process. While feature selection aims to eliminate redundant or irrelevant features, some critical information may also be inadvertently excluded. By contrast, raw image features retain all available data, which, when used with a robust classifier like RF, can sometimes yield higher accuracy. We have included this explanation in the discussion section for clarity.
Performance analysis of various features in the proposed approach on DDSM dataset
| Feature selection | Classifier | Accuracy (%) | Sensitivity (%) | Specificity (%) | F1-score (%) |
|---|---|---|---|---|---|
| Raw images | RF | 92.20 | 91.80 | 92.50 | 92.00 |
| Raw images | DNN | 82.22 | 80.36 | 82.50 | 81.42 |
| Gabor | RF | 88.00 | 87.50 | 88.30 | 87.90 |
| GLCM | RF | 85.50 | 85.00 | 85.80 | 85.40 |
| Gabor + GLCM | SVM | 90.50 | 90.00 | 91.00 | 90.40 |
| Gabor + GLCM | DT | 92.98 | 94.66 | 90.16 | 92.60 |
| Gabor + GLCM | RF | 96.64 | 95.90 | 97.08 | 96.72 |
DNN, deep neural network; DT, decision tree; GLCM, gray-level co-occurrence matrix; RF, random forest; SVM, support vector machine.
Feature selection emerged as a critical step in the pipeline, with its omission resulting in a reduced accuracy of 92.20%. It indicates that feature selection is vital in eliminating redundant or irrelevant features, thereby improving the classifier’s performance. Among the classifiers, RF consistently outperformed others, achieving the highest accuracy of 96.64% with the complete feature set (Gabor + GLCM). SVMs demonstrated moderate performance with an accuracy of 90.50%, reflecting their sensitivity to high-dimensional feature spaces, while DTs achieved 92.98% accuracy but were limited by their tendency to overfit on complex data.
In terms of sensitivity and specificity, the hybrid feature set combined with RF achieved the best results, with a sensitivity of 95.90% and a specificity of 97.08%. These metrics highlight the robustness of the proposed pipeline in accurately identifying abnormal cases while minimizing false positives (FPs). The omission of feature selection slightly reduced these values, further underscoring its importance. Overall, this study confirms the significance of hybrid feature extraction and feature selection in achieving superior classification performance. It also emphasizes the strength of RF in leveraging optimized features, making the proposed pipeline a promising solution for automated mammogram analysis. Future work could explore advanced feature selection techniques and ensemble-based deep learning models to enhance performance further.
Gabor features captured the texture information related to frequency and orientation, while the GLCM features provided insight into the spatial relationships of gray-level values. Various classifiers such as DT, RF, SVM, k-NN, and NB are used to evaluate the performance of the work on the MIAS and DDSM datasets. The performance of all mentioned classifiers with the proposed methodology is compared in Figure 4. These classifiers were chosen due to their effectiveness in handling classification tasks and wide usage in the literature. The results indicate that RF achieved the highest accuracy of 96.58% on the MIAS dataset and 95.90% on the DDSM dataset, outperforming other classifiers.

Comparison of performance metrics for various classifiers on the MIAS and DDSM datasets.
The performance analysis highlights the effectiveness of various classifiers on the MIAS and DDSM datasets for mammogram mass classification. On the MIAS dataset, the DT classifier achieved high accuracy, specificity, and sensitivity, demonstrating its high precision in identifying mammogram masses. The RF classifier also performed exceptionally well, achieving an accuracy of 96.58%, a specificity of 97.08%, a sensitivity of 95.90%, and an F1-score value of 96.72%. These results underscore the robustness of these ensemble-based classifiers. Other classifiers, such as k-NN and DNN, showed moderate performance, with accuracies of 66.67% and 72.22%, respectively, reflecting their limited ability to handle complex patterns in the dataset. Meanwhile, the linear SVM, RBF SVM, and NB classifiers achieved lower accuracies, ranging from 60.00% to 62.22%, indicating their limited effectiveness on the MIAS dataset compared to DT and RF.
A similar trend was observed in the DDSM dataset. The DT classifier again achieved good performance with accuracy, specificity, and sensitivity, reinforcing its reliability for this classification task. The RF classifier outperformed others, achieving an accuracy of 95.93%, a specificity of 94.94%, a sensitivity of 96.77%, and an F1-score value of 96.25%. k-NN and DNN also demonstrated reasonable performance on this dataset, with accuracies of 76.49% and 75.47%, respectively, while linear SVM, RBF SVM, and NB showed moderate performance with accuracies ranging from 73.54% to 74.57%.
The receiver operating characteristic (ROC) curve analysis (Figure 5) reveals that the RF classifier performs best, achieving the highest true positive rate (TPR) at minimal false positive rate (FPR), indicating excellent sensitivity and specificity. The DT also performs well, with a strong ROC curve and high discriminatory power, though slightly less effective than RF. By contrast, k-NN demonstrates moderate performance, with a higher FPR at comparable TPR levels, reflecting its susceptibility to FPs. The linear SVM and RBF SVM classifiers exhibit the weakest performance, with flatter ROC curves and higher FPRs, indicating poor sensitivity and a limited ability to distinguish between classes effectively. Overall, RF is the most reliable classifier for the given data, followed closely by DT, while the SVM-based approaches require further optimization to improve their classification performance.

ROC curve comparison for various classifiers. ROC, receiver operating characteristic.
Overall, the DT and RF classifiers consistently exhibited excellent accuracy and balanced specificity and sensitivity on both datasets. These findings validate their effectiveness in mammogram mass classification using the MSMO Gabor and GLCM features. The consistent performance of RF, in particular, highlights its robustness in leveraging the hybrid feature set, making it a suitable choice for clinical applications in automated mammogram analysis. While demonstrating moderate to good performance, other classifiers remain less reliable compared with DT and RF.
Table 4 compares the performance of various SOTA approaches for mammogram patch classification based on different features, classifiers, datasets, and accuracy results. Several approaches have demonstrated competitive accuracy rates, with some leveraging advanced feature extraction techniques and classifiers to achieve high performance. Among the notable works, the method proposed by De Belee et al. [55], which utilizes GLCM features and an SVM classifier, achieves an impressive accuracy of 97.46% on the MIAS dataset, highlighting the effectiveness of texture-based features for mammogram classification. Similarly, Beura et al. [21] achieved excellent results, with an accuracy of 99.40% on the DDSM dataset using a combination of Wavelet and GLCM features with a DNN classifier, showcasing the potential of deep learning approaches.
Summary of some related works on mammogram patch classification
| Ref. | Feature | Classifier | Feature dimension | Dataset | Accuracy (%) |
|---|---|---|---|---|---|
| [55] | GLCM | SVM | 200 × 1 | MIAS | 97.46 |
| [56] | Gabor + PCA | SVM | 86,400 × 1 | DDSM | 84 |
| [21] | Wavelet + GLCM | DNN | 120 × 1 | MIAS* (320) | 98.10 |
| DDSM* (550) | 99.40 | ||||
| [57] | RLTP | RF + SVM | 14 × 2 | MIAS* (376) | 90.00 |
| [58] | HOG, DSIFT, LCP | SVM | 24 × 1, 384 × 1, 80 × 1 | DDSM* (600) | 84 |
| [30] | FFST | SVM | 8776 × 1 | MIAS* (228) | 97 |
| 35857 × 1 | DDSM | 100 | |||
| [59] | Gabor filter | PSO + SVM | 1080 (9 OWs × 40 GFs × 3 SMs | DDSM* (1024) | 98.82 |
| Proposed | MSMS Gabor + GLCM | SVM | 128 × 1 | MIAS | 95.82 |
| DDSM | 93.32 | ||||
| MIAS | 95.91 | ||||
| Proposed | MSMS Gabor + GLCM | DT | 128 × 1 | DDSM | 94.87 |
| MIAS | 96.58 | ||||
| Proposed | MSMS Gabor + GLCM | RF | 128 × 1 | DDSM | 95.93 |
= Only a subset of the dataset is utilized in the experimentation.
DNN, deep neural network; DSIFT, Dense Scale Invariant Feature Transform; DT, decision tree; GFs; GLCM, gray-level co-occurrence matrix; HOG, Histogram of Oriented Gradients; OWs, overlapping windows; PCA, Principal Component Analysis; PSO, particle swarm optimization; RF, random forest; RLTP, Radial Local Ternary Patterns; SVM, support vector machine.
On the contrary, Khan et al. [59] obtained an accuracy of 98.82% using a Gabor filter-based feature set combined with PSO and SVM on the DDSM dataset, emphasizing the utility of Gabor filters for texture analysis in mammogram classification. A more extreme case is suggested by Gedik et al. [30], which reached 100% accuracy on DDSM using Fast Finite Shearlet Transform (FFST) features and an SVM classifier, although this result might be dataset-specific and subject to certain limitations.
In comparison, the proposed approach with MSMSGabor and GLCM features combined with SVM, DT, and RF classifiers achieved accuracies of 95.82%, 95.91%, and 96.58% on MIAS, while they achieved accuracies of 93.32%, 94.87%, and 95.90% on DDSM. The proposed approach utilizing MSMS Gabor + GLCM features combined with RF achieves the best overall accuracy, suggesting that the combination of multi-scale, multi-orientation Gabor filters and GLCM is highly effective in extracting features for breast cancer classification. This method outperforms other classifiers such as SVM and DT in terms of both MIAS and DDSM datasets, confirming the robustness of the RF classifier in this context. While the performance is strong, it shows slightly lower accuracy than other approaches, such as those using DNNs or optimized feature sets. However, the proposed model balances simplicity and effectiveness, providing a good performance benchmark across two standard datasets with fewer dimensions in the feature set, which could indicate a more efficient model in terms of computational complexity.
Overall, the findings indicate that different combinations of features and classifiers effectively classify mammogram patches. The proposed method with MSMSGabor and GLCM and a RF classifier showed promising results. Both the datasets—MIAS and DDSM—showed a good accuracy score. Thus, feature selection and the choice of classifier play a vital role in mammogram patch classification. Significant further research and validation are required to estimate the generalizability and robustness of these approaches.
The confusion matrices presented in Figure 6 as results of the proposed model on the MIAS and DDSM datasets show that the model has excellently performed on both, maintaining a good balance between sensitivity and specificity with high accuracy. With MIAS, an accuracy of 96.58%, along with a specificity of 97.08%, a sensitivity of 95.90%, and an F1-score of 96.72%, indicates the precision in identifying the right normal cases along with the abnormalities in the mass. With the DDSM, though the accuracy went down to 95.93%, the sensitivity shows an increase at 96.77%, where the model did a slightly better job of diagnosing the cases as abnormal ones of DDSM. The F1 scores for both datasets are close: 96.72% for MIAS and 96.25% for DDSM, showing that the model is balanced for both normal and abnormal instances. Overall, the model shows good classification performance, with a slight edge in favor of normal case identification in MIAS and a slight edge in favor of abnormal case identification in DDSM.

Prediction of the proposed model on the MIAS and DDSM datasets.
The confusion matrices for the MIAS and DDSM datasets reveal key misclassification issues. FPs occur when normal instances are misclassified as abnormal. For MIAS, 9 normal instances, and for DDSM, 20 normal instances were incorrectly identified as abnormal. This can lead to unnecessary follow-up tests and biopsies. Causes include low-quality images with noise or artifacts and overlapping feature distributions between normal tissue and benign masses. False negatives (FNs) occur when abnormal instances are misclassified as normal, with 2 and 49 instances misclassified in MIAS and DDSM, respectively. FNs are particularly concerning as they may lead to missed diagnoses and undetected malignancies. These errors arise from low-quality images, subtle masses with poor contrast, and overlapping feature distributions between benign and malignant masses. Improving image quality through preprocessing (e.g., noise removal and contrast enhancement) is essential to reduce misclassifications. Additionally, advanced feature extraction methods, like deep learning, could better capture complex patterns in mammograms. Addressing class imbalance in the dataset through augmentation techniques could improve sensitivity and reduce FNs. The model’s accuracy and reliability can be significantly improved by tackling these challenges, ensuring better detection of both normal and abnormal cases in mammogram classification.
This study introduces a novel hybrid feature extraction technique combining MSMO Gabor wavelets with GLCM features for classifying mammogram masses. The proposed method effectively captures both texture and spatial characteristics, resulting in notable classification accuracy across the MIAS and DDSM datasets. Among the evaluated classifiers, RF delivered the highest performance, with accuracy rates of 96.58% on MIAS and 95.93% on DDSM. The proposed feature selection approach significantly contributed to optimizing this model’s performance in terms of efficiency and precision, eliminating relevant or redundant features. The performance comparison with competing SOTA methods further indicates the robustness and reliability of this approach. Advanced deep-learning architectures and re-balancing could be considered for further tuning toward higher specificity or sensitivity in diagnostic diagnostics. In its general form, however, this contributes to expanding computer-based diagnostic systems, enabling radiologists with precise and accurate early detection abilities of breast cancers.
