DEEP-BTS: Deep Learning based Brain Tissue Segmentation using ResU-Net Model

Sivaprakash, P; Banumathi, J; Mishra, Ashis Kumar; Jayapriya, P

doi:10.2478/msr-2025-0041

Full Article

1.

Introduction

The human brain, often observed as the epicenter of nervous activity, is one of the body’s most vital yet intricate organs [1], [2]. A tumor is a collection of abnormal swelling of brain cells in the nervous system [3]. High-resolution brain images with a range of contrasts can be obtained using MRI, a non-invasive and safe imaging technique [4], [5]. Its abilities have led to its widespread use in diagnosing neurological conditions. There is interest in assessing how the brains of infants and adults develop, as MRI provides a powerful non-invasive technique for studying brain anatomy and function [6], [7]. This interest arises because MRI produces several cross-sectional images with varying contrasts, enabling safe and non-invasive investigation of the brain [8]. Brain image segmentation is crucial for both basic neuroscience research and clinical diagnosis to evaluate neurological disorders [9], [10]. Given a brain image, usually obtained by MRI, brain image segmentation estimates an annotated (labeled) image. This image is divided into multiple anatomical/structural regions, and the set for each voxel is created beforehand [11]. Instead of relying on specialists’ visual inspection, segmentation allows for objective diagnosis and study by providing a quantitative assessment of brain tissue volume [12].

The accuracy and processing efficiency of deep learning (DL)-based automatic segmentation techniques are significantly higher than those of conventional techniques [13]. By precisely identifying brain regions of interest and distinguishing them from healthy brain tissue, DL techniques enable more precise quantitative analysis [14]. Many clinical and neurological studies depend on the segmentation of the brain’s gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) [15]. This DL-based segmentation supports image-guided procedures, makes surgical planning easier, and allows for the visualization and investigation of anatomical components [16], [17]. Additionally, DL-based techniques have made significant progress in segmenting brain tissue, including that of adults, newborns, and fetuses [18]. Segmenting brain MRI is challenging due to complex structures and intensity variations. Accurate segmentation of patients using MRI remains difficult for more effective diagnosis and treatment. Furthermore, automatic segmentation is challenging because of the complexity and diversity of brain tissues. In this paper, a novel DL-based DEEP-BTS model is proposed for brain tissue segmentation (BTS) with brain MRI. The contributions of this work are summarized as follows:

The first step is skull stripping, which removes the skull and scalp from the MRI images. After skull stripping, the images are enhanced using contrast stretching adaptive trilateral filter (CSATF) to improve tissue contrast and reduce noise while preserving important details.
The noise-free images are then fed into the ResU-Net architecture. This model combines U-Net’s efficient segmentation capabilities with ResNet residual connections, allowing it to learn complex features of brain tissues while maintaining accuracy.
The ResU-Net model segments the MRI images into different brain regions, including GM, CSF, and WM.

The structure of the paper is organized as follows: Section 2 presents the literature survey; Section 3 explains the DEEP-BTS model; Section 4 provides the performance outcomes and comparative analysis; and Section 5 concludes with the conclusion and future work.

2.

Literature survey

In recent years, researchers have proposed numerous approaches to improve the accuracy of BTS segmentation. This section summarizes recent machine learning (ML) and DL studies focused on segmenting various BTS conditions using image-based data and advanced computational techniques.

In 2024, Mohammadiet al. [19] proposed a BTS that addresses Intensity Non-Uniformity artifacts and Multiple Sclerosis lesions. Compared to previous methods for BTS and Multiple Sclerosis lesion segmentation, the proposed methodology demonstrates significant improvement in the Dice index (DI), particularly under high noise and artifact conditions. Experimental results show that the recommended technique outperforms FCB Former, U-Net, and Attention U-Net in terms of DI performance in the BTS.

In 2024, Kollem [20] introduced a technique for classifying and segmenting MRI brain tumor tissue using an optimal SVM. To achieve a sparse representation of an image’s smooth contour, the contourlet transform uses a twin filter bank structure consisting of a directional filter and the Laplacian pyramid.

In 2024, Gudise et al. [21] proposed an enhanced firefly algorithm based on chaos, integrated with fuzzy C-eans (CEFAFCM), to separate tissues from brain MRIs. The Firefly Algorithm (FA) and a chaotic map are used together with a spatially modified FCM method called CEFAFCM to initialize the firefly population. Experimental results show that the proposed method outperforms several existing brain MRI segmentation techniques, including FCM, BCFCM, FAFCM, and En-FAFCM.

In 2024, Daoudi and Mahmoudi [22] proposed WM, GM, and CSF tissue classifications for MR brain imaging. To improve treatment accuracy, the proposed segmentation procedure combines two algorithms: Whale Optimization Algorithm (WOA) and the Hidden Markov Random Field (HMRF).

In 2021, Veluchamy and Subramani [23] proposed a segmentation method for brain tissue in a medical decision support system. Quantitative parameters such as peak signal-to-noise ratio (PSNR), discrete entropy, specificity (SP), F1 score (F1), accuracy (AC), Jaccard index (JI), and DI are used to compare the proposed approach with other current approaches. Experiments indicate that the proposed technique achieves a reasonable balance between noise and intensity inhomogeneity.

In 2020, Yamanakkanavar and Lee [24] proposed a patch-wise M-net to automatically segment MRI images of the brain. According to experimental data, the proposed approach outperformed state-of-the-art methods, achieving average segmentation accuracy of 95.44 % for GM, 94.81 % for CS, and 96.33 % for WM.

In 2021, Long et al. [25] introduced a Multi-Scale Learning U-Net Based Encoding-Decoding Method for BTS in MRI. The study also developed a multi-branch output structure that generates more precise, edge-preserving forecasting maps by combining dense neighboring prediction features at dissimilar scales during the decoding stage.

In 2023, Karimi et al. [26] proposed a U-Net for learning to separate the fetal brain tissue from noise annotations. The proposed techniques appropriately account for tissue boundary ambiguity. The approach produced results that were significantly more accurate than several advanced techniques, with U-Net being the closest competitor.

From this literature, existing techniques for BTS using various ML and DL models exhibit several limitations. One major challenge in BTS is the intensity of homogeneity caused by MRI artifacts. This leads to non-uniform brightness across the image, making it difficult to distinguish between different tissue types. Traditional thresholding or clustering methods struggle with this variation, reducing segmentation accuracy. Additionally, the presence of noise and partial volume effects further complicates boundary detection. To address these problems, a novel DEEP-BTS method was introduced for the accurate classification of BTS.

3.

Proposed DEEP-BTS

In this research, a novel DL-based DEEP-BTS model is developed for BTS using brain MRI images. Fig. 1 shows the DEEP-BTS methodology.

A.

Dataset description

Brain MRI scans are extracted from the BrainWeb dataset. The popular synthetic MRI dataset BrainWeb offers controlled situations with varying intensity non-uniformities (RF inhomogeneities) and noise levels. Key characteristics include RF inhomogeneity levels of 0 %, 20 %, and 40 %, which simulate intensity non-uniformities, and noise levels of 0 %, 1 %, 3 %, and 5 %. The training set contains 36 images from all noise and RF levels, the validation set contains 12 images, and the test set contains 57 images.

B.

Pre-processing

In pre-processing to enhance the quality of MRI images, skull stripping is performed first. After stripping the skull from the input MRI image, the MRI image is pre-processed using CSATF. CSATF combines two filters: contrast stretching (CS) and adaptive trilateral (AT) filter.

Contrast stretching

In this denoising phase, each original intensity value is replaced, and histogram comparisons are conducted using a locally modified contrast-stretching adjustment. A new level is assigned to each pixel by applying a flexible transfer function derived from the characteristics of the MRI images. (1) $Range = Q_{max} - Q_{min}|$ Range = \left| {{Q_{\max }} - {Q_{\min }}} \right| where Q is the input image, and the calculation of the strength range of the input determines the range. Here Q_max and Q_min are the maximum and minimum values of the input image for the new intensity. Each pixel is given an additional intensity using the following equations: (2) $X_{k} = \begin{matrix} Q_{N} - σ_{N} & , if Q_{N} = Q_{max} \\ Q_{N} + σ_{N} & , if Q_{N} = Q_{min} \end{matrix}$ {X_k} = \left\{ {\matrix{ {{Q_N} - {\sigma _N}} & {,if\;{Q_N} = {Q_{\max }}} \cr {{Q_N} + {\sigma _N}} & {,if\;{Q_N} = {Q_{\min }}} \cr } } \right. (3) $r_{n} = M - \sqrt{{(Range - M)}^{2}}$ {r_n} = M - \sqrt {{{\left( {Range - M} \right)}^2}}

Each pixel value is altered using the given formulas, where M ranges between 0.01 and 0.02.

Adaptive trilateral filter

The MRIs are pre-processed using the ATF to remove noise artifacts. It implements the guiding principles of the bilateral filter. The issue of high-gradient zones being ineffectively filtered by bilateral filters is resolved by using a trilateral filter under tilting. When a bilateral filter is applied to the image data, p should average highly related surrounding pixels and eliminate dissimilar pixels, yielding the tilting angle h_θ of a trilateral filter at the target pixel. (4) $h_{θ} (q) = \frac{1}{l_{θ}} \sum_{q} \sum_{p} f_{p} e (q, p) z (f_{q}, f_{p})$ {h_\theta }\left( q \right) = {1 \over {{l_\theta }}}\sum\nolimits_q {\sum\nolimits_p {{f_p}\;e\left( {q,p} \right)\;z\left( {{f_q},{f_p}} \right)} }

When the kernel is tilted, the trilateral filter’s e (.) and z (.) functions become non-orthogonal. Equation (5) establishes the value of each pixel at this plane. (5) $j (q, p) = f (q) + h_{θ} \cdot (q - p||)$ j\left( {q,p} \right) = f\left( q \right) + {h_\theta } \cdot \left( {\left| {\left| {q - p} \right|} \right|} \right) where |q − p| is the multi-dimensional spacing between q and f(q), represented by q at a target pixel, and h_θ is the tilting angle. To find the output of a trilateral filter, the resulting image is first passed through a bilateral filter, and then the value j is removed from the surrounding area of the target pixel. (6) $f_{o} (q) = f_{in} (y) + t (y) Δ$ {f_{\rm{o}}}\left( q \right) = {f_{{\rm{in}}}}\left( y \right) + t\left( y \right)\Delta where Δ is the spatial distance between pixels q and p, and f_o(q) is the output function. Tilting improves the filter’s capacity to smooth high-gradient zones. It is insufficient because trilateral filter failure can still occur when tilting happens in areas with significant gradient variations. Data augmentation is an essential pre-processing technique that uses synthetic data to help the model learn and generalize features better. The ideal way to support the network’s learning of the desired features is to use data augmentation. Fig. 2 shows the pre-processing stages.

C.

Segmentation

The noise-free images are fed into the ResU-Net [27] model, which segments different brain tissues, including GM, CSF, and WM. To address the issue of training degradation as network layer depth increases, each convolutional (Conv) layer in the U-Net model encoding path is replaced with a residual learning block. Fig. 3 depicts the ResU-Net architecture.

The encoding path consists of three components: an input unit, a residual unit, and a head unit. The head unit includes a BN layer and a ReLU after two conv layers. The first residual unit contains three residual blocks layered with nine conv layers. The second residual unit comprises four residual blocks layered with twelve conv layers. The third residual unit consists of six residual blocks and 18 conv layers. The fourth residual unit contains three residual blocks and nine conv layers. In this work, default parameters were used for the residual unit structures. An output unit, one addition block, and four concatenation blocks are applied repeatedly throughout the decoding path. Each concatenation block consists of a one-to-one conv and upsampling, which reduces the number of feature channels by half. Output feature maps are generated from the corresponding residual unit of the encoding path, as well as output feature maps. Segmentation results are mapped for binary classification at the last layer of the decoding path using a sigmoid activation and a 1×1 convolution filter. Between each residual block’s output feature and the decoding path’s conv layer, copy and crop operations are applied. Multi-scale feature fusion requires the cropping and copy operations.

4.

Results and Discussion

This section uses Matlab-2019b and the DL toolbox to evaluate the proposed model efficiency. The DEEP-BTS model is assessed using various measures, including AC, SP, recall (RE), precision (PR), and F1. Benchmarks include the overall accuracy rates of the DEEP-BTS method, with performance explicitly specified and assessed.

Fig. 4 presents the simulation results of the proposed DEEP-BTS model using different input brain MRI image samples from the BrainWeb dataset. Column 1 displays the original brain MRI scans, while Column 2 shows the skull stripping MRI images to focus on brain tissues. Column 3 illustrates the pre-processed MRI scans for improved segmentation accuracy. Column 4 depicts variations of the MRI slices generated using augmentation techniques. Columns 5 to 7 present the segmented outputs for different brain tissues, such as CSF, WM, and GM.

A.

Performance analysis

A proposed DEEP-BTS model was evaluated based on SP, RE, PR, AC, and F1. (7) $SP = \frac{T_{neg}}{T_{neg} + F_{pos}}$ SP = {{{T_{{\rm{neg}}}}} \over {{T_{{\rm{neg}}}} + {F_{{\rm{pos}}}}}} (8) $RE = \frac{T_{pos}}{T_{pos} + F_{neg}}$ RE = {{{T_{{\rm{pos}}}}} \over {{T_{{\rm{pos}}}} + {F_{{\rm{neg}}}}}} (9) $PR = \frac{T_{pos}}{T_{pos} + F_{pos}}$ PR = {{{T_{{\rm{pos}}}}} \over {{T_{{\rm{pos}}}} + {F_{{\rm{pos}}}}}} (10) $AC = \frac{T_{pos} + T_{neg}}{Total no. of samples}$ AC = {{{T_{{\rm{pos}}}} + {T_{{\rm{neg}}}}} \over {Total\;no.\;of\;samples}} (11) $F 1 = 2 (\frac{PR + RE}{PR + RE})$ F1 = 2\left( {{{PR + RE} \over {PR + RE}}} \right)

Here, T_neg and T_pos represent the true negatives and true positives of the sample images, while F_neg and F_pos represent the false negatives and false positives of the input images.

Table 1 shows the classification performance achieved by the proposed DEEP-BTS model for BTS. AC, SP, RE, PR, and F1 are the metrics used to determine performance. The proposed DEEP-BTS model achieves a total AC of 98.91 % using the dataset. The proposed DEEP-BTS model also achieves overall SP, RE, PR, and F1, values of 97.74 %, 97.48 %, 98.24 %, and 96.51 %, respectively.

Table 1.

Performance evaluation of the DEEP-BTS.

Types	AC	SP	RE	PR	F1
CSF	99.12	97.91	97.43	98.76	97.65
GM	98.36	96.75	98.14	97.13	96.14
WM	99.25	98.58	96.87	98.83	95.76
Overall	98.91	97.74	97.48	98.24	96.51

Fig. 5(a) and Fig. 5(b) show the AC and loss graphs of the DEEP-BTS model. Fig. 5(a) presents the AC curve, with accuracy and epochs on opposing axes; as the number of epochs increases, model AC also increases. The epoch versus loss curve in Fig. 5(b) shows that the model’s loss decreases as the number of epochs increases. The proposed DEEP-BTS model achieves an AC of 98.91 %.

B.

Comparative analysis

In this section, the experimental results of DEEP-BTS are presented, focusing on a comparison of its performance with other segmentation methods. Fig. 6 offers a summary of the outcomes for Graphcut, SegNet, and U-Net, which are widely used in BTS. Segmentation metrics such as the JI and DI are used to assess the effectiveness of each algorithm. These metrics help evaluate the precision and accuracy of the segmentation techniques in various scenarios.

Fig. 6 provides a graphical representation of the ResU-Net. It compares various segmentation algorithms with ResU-Net based on JI and DI metrics. The proposed ResU-Net increases the overall DI by 8.63 %, 9.99 %, and 3.07 for Graphcut, SegNet, and U-Net, respectively. According to Table 2, ResU-Net achieves the highest DI of 98.50 and JI of 97.80 among Graphcut, SegNet, and U-Net algorithms. This analysis indicates that the proposed ResU-Net demonstrates the best segmentation performance.

Table 2.

Comparison of existing methods and DEEP-BTS.

Authors	Techniques	DI

		CSF [%]	GM [%]	WM [%]
Veluchamy, M. and Subramani, B., (2021) [23]	Fuzzy C-Means	87	89	91
Yamanakkanavar, N. and Lee, B., 2020 [24]	M-Net	87	89	91
Srikrishna, M., et al., (2021) [28]	U-Net	75	79	82
Proposed	ResU-Net	98.33	98.04	99.15

Fig. 7 presents segmentation results for standard U-Net and ResU-Net. Column 1 shows the original input images, and Column 2 shows the ground truth segmentation images from the BrainWeb dataset. Columns 3 and 4 display the segmented results using U-Net and the ResU-Net method, respectively. ResU-Net reduces the false positive rate while improving DEEP-BTS performance. Based on the above comparison, the proposed ResU-Net yields a higher DI value than the other segmentation approaches. The segmentation output of ResU-Net is more accurate and closely aligned with the ground truth, capturing fine structural details and boundaries compared to other methods.

Table 2 shows a comparison of existing and proposed models, including Fuzzy C-Means, M-Net, and U-Net. Different segmentation methods yield varying DI values for BTS. Fuzzy C-Means [23] achieved DI values of 87 % CSF, 89 % GM, and 91 % WM. M-Net [24] produced similar results with 87 % CSF, 89 % GM, and 91 % WM. U-Net [28] performed lower, with 75 % CSF, 79 % GM, and 82 % WM. The proposed ResU-Net outperformed all methods, achieving 98.33 % CSF, 98.04 % GM, and 99.15 % WM, indicating improved segmentation accuracy.

C.

Ablation study

In this analysis, the proposed DEEP-BTS model was evaluated with and without skull stripping and CSATF for BTS.

Table 3 presents the comparative performance of DEEP-BTS under different configurations: with and without skull stripping and CSATF. Without skull stripping and CSATF, the model achieved 97.06 % AC, 94.32 % F1, and 95.98 % DI. Incorporating both skull stripping and CSATF resulted in the highest performance, with 98.91 % AC, 96.51 % F1, and 98.50 % DI. These results clearly indicate that pre-processing steps such as skull stripping and CSATF significantly improve the performance of the DEEP-BTS model in accurately segmenting brain tissues, with the combination of both yielding the most effective outcome.

Table 3.

Performance comparison of the DEEP-BTS model with and without skull stripping and CSATF.

Metrics	without skull stripping without CSATF	with skull stripping without CSATF	with skull stripping with CSATF
AC	97.06	97.88	98.91
F1	94.32	95.94	96.51
DI	95.98	97.65	98.50

5.

Conclusion

This research introduced a novel DEEP-BTS model for BTS using brain MRI images. The MRI images undergo skull stripping to remove unnecessary regions. The images are denoised by a CSATF to improve image quality, reduce noise artifacts, and for augmentation. The pre-processed images are given to the ResU-Net model, which segments different brain tissues, including CSF, GM, and WM. The proposed ResU-Net increases the overall DI by 8.63 %, 9.99 %, and 3.07 % for Graphcut, SegNet, and U-Net, respectively. As a result of the experiment, the proposed method performed 98.91 % more accurately than the previous method in segmenting the classes of brain tissues. The proposed ResU-Net out-performed Fuzzy C-Means, M-Net, and U-Net methods, achieving 98.33 % CSF, 98.04 % GM, and 99.15 % WM, indicating improved segmentation accuracy. Future work in BTS could focus on multi-modal MRI fusion, integrating FLAIR, T1-weighted and T2-weighted images using DL to improve segmentation accuracy, especially for pathological brains.

DEEP-BTS: Deep Learning based Brain Tissue Segmentation using ResU-Net Model

Full Article

Paradigm

My account