The brain, a vital organ in the human body, is responsible for managing emotions, thoughts, behaviors, and sensory processing. Its cerebral cortex contains billions of neurons that facilitate signal transmission. The abnormal growth of brain cells in the human body causes tumors, which result in brain tumor (BT) [1]. BT is one of the most severe ailments that occurs in the body, which leads to a decrease in the normal life expectancy of humans. The tumors developed from the brain tissue spread aggressively to every part [2]. The abnormalities that develop in the human body may lead to death. One of the most dangerous abnormalities produced in the brain tissue causes BT, and it is categorized into two types. They are primary and secondary tumors [3, 4]. When a tumor forms inside the brain without any symptoms, it is called a primary tumor, which develops in the brain tissue itself. Another one is metastatic BT, which is formed in another part and spreads to the brain eventually [5]. The most widespread BT are pituitary adenomas, meningiomas, and gliomas. Several endeavors have been taken to diagnose the BT in an efficient way to automatically classify or detect the disease [6].
Magnetic resonance imaging (MRI) is one of the primary diagnostic tools used for identifying BT. However, due to the intricate structure of the brain, achieving precise diagnosis remains complex, demanding, and time-intensive [7]. BTs sometimes cause various symptoms in the body, based on the structure and position of tumors present, which include seizures, balance issues, unusual behavior, confusion, changes in vision, and memory issues [8]. To monitor, treat, analyze, and diagnose the human body, miscellaneous medical techniques are utilized, such as MRI, computerized tomography (CT), X-rays, and ultrasound imaging (UI). Among these medical professionals, most preferred MRI techniques because of their nonionizing radiation and noninvasive methods. Moreover, the MRI technique easily identifies the blood circulation in the veins and detects the cells [9, 10]. Many machine learning (ML) and deep learning (DL) algorithms use brain MRI images for detecting BTs using the datasets [11]. Because of the nonionizing radiation produced in the cells, an MRI scan can produce various image features that can be captured and used to recognize the tumor cells. The diagnosis of an MRI tumor has many modalities for separately classifying and segmenting [12]. The early-stage diagnosis and detection of cancer cells becomes critical, and it is a challenging task because of the tumor’s formation, shape, and dimensions. ML techniques are time-consuming, so advanced DL techniques are preferred. [3].
Numerous methods have been established to identify the BTs, with various techniques, such as ML and DL. One of the most widely used methods is convolutional neural network (CNN) because it can accurately classify and recognize images. For the identification of cellular structures and molecular identification, image classification, prognosis of diseases, and tissue segmentation, ML is preferred [13]. The method of analyzing, organizing, and collecting the medical images has become digitalized. Even in cutting-edge methods, the explication of medical-related images requires more time and mitigates the accuracy of classification [5]. The CNN approach used in the classification of multigrade BTs using MRI first segments the images, then augments the images, and then pretraining is given to correctly classify the BT [14]. The tumors can be either small or big, which makes the model harder to analyze the BTs. So, a balanced dataset is needed to accurately classify the brain images and detect the type of disease. But some datasets used in the research are imbalanced, and they affect the performance of the model [15, 16]. The NeeuroNet19 model accurately classifies the four types of tumor cells, and it faces some challenges, such as in tumor sizes and the position of the tumor cells. Artificial neural network with the integration of DL and ML has made a remarkable success in classifying the tumor cells and enhancing the overall performance [2].
To address these challenges, this study proposes a novel framework named sonar energy optimized distributed tetra head attention-based convolutional bidirectional network (SEnO-DTCBiNet), which aims to classify BT effectively, ensuring early identification and timely diagnosis, ultimately reducing BT-related fatalities. The MRI images are allowed for the classification model, where the precise region of interest (ROI) extraction is carried out using a binary thresholding technique to ensure only the most relevant features for analysis. The segmentation process allows for precise segmentation of tumor regions, which is crucial for reliable diagnosis. Moreover, the multistructural matrix-based statistical deep flow features (MSMSD) allows for capturing diverse rich information, which includes structural patterns, texture, and statistical properties, for classifying more discriminative features. The SEnO-DTCBiNet architecture utilizes the extracted features to differentiate between tumor categories. The core components and strengths of this approach are detailed in the following sections.
Optimized fused attention enabled distributed W-net based segmentation (OFA-based DW-net): Integrating tetra-head attention and the SEnO algorithm within the OFA-based DW-Net segmentation process improves accurate tumor region identification. This design combines several attention modules, such as BAM for enhanced feature learning, squeeze and excitation (SE) for channel-specific attention, Ultra Lightweight Sub Space Attention Module ULSAM for focusing on essential structures, and Criss-Cross (CC) Attention for dynamic contextual information capture. Additionally, the fine-tuning of W-Net through SEnO allows to attain precise outcomes in segmentation.
SEnO-DTCBiNet: The SEnO-DTCBiNet combines the distributed tetra head attention (DTHA) and the deep BiLSTM layers to capture temporal dependencies and enhance contextual understanding of tumor regions, leading to increased reliability and accuracy in BT classification. Additionally, the SEnO algorithm obtained by hybridization of Bat, dolphin echolocation, and the Sparrow Search characteristics effectively fine-tunes the model parameters, resulting in higher efficiency and robustness.
The structure of this paper is outlined as follows: Section II reviews related literature, Section III aligns the approach with the system model, Section IV details the methodology for tumor classification, Section V presents the outcomes, and Section V concludes the study.
This section explains the literature review part, describing the traditional approaches used in the context. Ozkaraca et al. [1] proposed a Dense-CNN approach for classifying BT using MRI data. The model incorporates existing DL networks, such as VGG16 and DenseNet, to process MRI images more effectively. While this approach boosts classification performance, it requires considerable processing time. Anantharajan et al. [11] developed a BT detection method based on DL and ML techniques. After acquiring the MRI images, they undergo preprocessing, segmentation, and feature extraction using various techniques. Ultimately, a DL model is employed to detect abnormal brain tissues. The result found that the suggested model established good sensitivity, accuracy, and specificity in classifying the normal and abnormal tissues. But it faces some concerns in classifying the image, as it considers only the grayscale images.
Deepa et al. [4] suggested a Chronological Jaya Honey Badger Algorithm-based Deep Residual Network (CJHBA-DRN) for categorizing the BTs based on an optimization method. However, DeepMRSeg is used for segmenting the images, and for training the CHBA algorithm is introduced. After that, features are extracted and then augmented by the CJHBA algorithm to categorize the BT using the dataset. Even though this method performs better in detecting BT, it has some challenges. This method considers only the selected features, which decreases the performance of classification. Muezzinoglu et al. [7] introduced a PatchResNet-based model to classify BTs using MRI images. In the developed method, two feature extractors are introduced in addition to the three feature selectors. The image classification method of the model was improved, and the accuracy rate for classifying BT was also enhanced by the Iterative hard majority voting (IHMV) technique. The main drawback occurred in this model was that the datasets had to be improved, and along with this, the absence of a better optimization method reduced the classification accuracy.
Rahman and Islam [13] suggested a DL model to detect BT identification through the datasets. Here, the images are first resized and deployed with a grayscale transformation in the model, and then augmented to reduce the complexity issues. Moreover, this PDCNN method extracts both local features as well as global features to avoid the over-fitting problem with the normalization method. This method has some limitations in handling the dataset, which has to be enlarged to identify the tumors in 3D pictures. Haque et al. [16] suggested a DL method-based NeuroNet19 method to detect BTs. VGG19 with the cascading method was utilized to extract features of global and local features in the image. An explainable artificial intelligence (AI) is employed to increase the model’s accountability, and LIME, which is used to improve the classification accuracy and mitigate false negative images. Therefore, this method shows an outstanding performance both in accuracy and promptness. The outcoming barriers that occurred in this model are, it does not detect binary class tumors, and the training can be improved with enlarged datasets.
Babu Vimala et al. [2] introduced a DL model to detect and categorize the BTs by Efficient Net. The Grad-CAM is utilized in the model to highlight the affected areas and classify the tumorous cells in the brain. The fine-tuned EfficientNetB2 accurately explains the generalization and efficiency of the model through testing phases and achieves better performance. But the flaws involved in this method need more time for training, which suppresses the speed of classification. Islam et al. [6] proposed a DL model to diagnose BT cells using MRI images. Here, four architectures are utilized to classify BTs, which include Mobile Net, InceptionV3, DenseNet121, and VGG19. The BT cells are first classified through a fine-tuning method, and a comparison is made with the above approaches. From that, the Mobile Net achieves better accuracy by classifying BT cells. Even though this method enhances classification accuracy, it is selected only for a few features.
The limitations that occurred during the classification of BT in earlier approaches are discussed below.
In the suggested model [1], the classification accuracy for detecting the BT is improved, but the processing time required is longer.
The EDN-SVM classifier method [11] considers only the gray-scale images to accurately detect the normal and abnormal tissues present in the brain cells, which limits its applicability across diverse types.
In [6], the PDCNN method captures both high- and low-level image features, but expanding the dataset is necessary for improved BT cell classification accuracy.
The EfficientNetB2 method shows better performance with good efficiency, but the training time required is more, which diminishes the speed of detection [2].
In the developed method [6], only the selected features are utilized by the model, so the classification process depends on the selected region and suppresses the other tissues in the MRI image.
BT causes disability, dreadful illness, and even death to people at an early stage. The DL model with advanced techniques in the medical field enables to diagnosis of BT at an early stage to save human life. But still, the researchers face many challenges in detecting the BT and diagnosing it, because of its complex nature of the brain. The main barriers in the existing models involved lower accuracy, overfitting problems, low classification performance was low, and higher time complexity. Therefore, to detect the disease accurately with more precision and effectively enhance the tumor grading performance, a SEnO-DTCBi Net model is proposed. The input images collected from the datasets is denoted as:
The actual labels of the Brats2020 dataset [18] and Brats2018 dataset [19] are specified as:
The research implemented for an effective BT classification system includes the initial process of collecting diverse MRI images through CT scan machines from laboratories or hospitals. These images are aggregated in BraTS 2020 [18], BraTS 2018 [19], the BT dataset [17], and the real-time dataset and are further allowed for the ROI extraction process to ensure that only the most relevant and important features are considered for analysis. Then, a series of features is captured to understand the complex structures of tumor regions more precisely, leading to more discriminative features for classification. These features are then fused into the proposed classification system to initiate the training phase. Test features are then applied to the trained model to obtain the tumor grading for differentiating the types of BT for timely and early diagnosis.
This search introduces an approach named the SEnO-DTCBiNet framework, designed for BT detection and classification using MRI scans. Initially, the collected BT images undergo a ROI extraction stage, where significant features are identified and passed on to the OFA-driven DW-Net architecture. Through this model, a SEnO optimization is utilized with the integration of bat optimization, Dolphin Optimization, and Sparrow Search optimization to train the model and enhance the model’s effectiveness. Consequently, the segmentation phase is carried out on a w-net model with four fused attention mechanisms for the accurate segmentation of images. Following this, the segmented output proceeds to a dedicated feature extraction unit, where vital attributes are drawn out using a MSMSD model. The MSMSD model includes three different, distinct features, such as GLCM-derived features, statistical deep flow-based features, and hybrid 3D Structural pattern features. Once these MSMSD features are obtained, they are given to the SEnO-DTCBiNet model to classify BT such as Necrotic and nonenhancing tumors, GD-enhancing tumors, and peritumoral edema. A schematic representation of the SEnO-DTCBiNet architecture is shown in Figure 1.

Block diagram of the SEnO-DTCBiNet framework. SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network.
The input images are collected from the Brats2020 dataset [18], the BT dataset [17], and the Brats2018 dataset [19]. Let us assume the dataset is D, and it is given by,
The collected input image si is fed to an ROI extraction phase, which extracts the selected ROI from the images for further processing through a binary thresholding mechanism. The original image is changed into a grayscale image with pixel values ranging from [0,255]. The threshold value Tr is calculated by taking the average value of the grayscale image. For each pixel si, the intensity is compared with the threshold value Tr. During comparison, when the pixel intensity value is higher than Tr, then the binary value is denoted as “1,” and that is taken as the interested regions, and if the pixel intensity value is less than Tr, then the binary value will be represented as “0,” and those regions will be eliminated. So, to obtain a clear image, a binary mask is applied that retains the required region. The major purpose of the mechanism is to reduce errors and to enhance classification accuracy. In the ROI extraction phase, the output dimension of the images varies for each slice, and hence, the resized images are formed with a size [512,512,1] that helps the segmentation process to be carried out more smoothly. Thus, the resultant output H is obtained from the ROI extraction, which is further transferred to the segmentation phase.
Segmentation is employed to enhance the precision of BT classification and support more accurate diagnosis. By segmenting the images, the detection process becomes more efficient, and the overall analysis of the medical images is significantly streamlined. Hence, the segmentation process is performed by an OFA-based DW-Net model for identifying the accurate position of the BT. The image obtained from the ROI extraction step serves as the input for the OFA-based DW-Net method [20]. In general, the W-Net structure is considered, where four attention mechanisms are fused, namely, BAM, SE, ULSAM, and CC attention, in a distributed manner to derive the OFA-based DW-Net segmentation model. Additionally, the model’s segmentation component is fine-tuned through an optimization algorithm that operates at the fifth layer of the framework, with detailed insights provided in Section 4.6.
The OFA-based DW-Net segmentation depends on the W-Net model, which reconstructs the input image and predicts the output map without degrading the information. It is composed of two parts encoder and a decoder, with 47 convolutional layers with 18 modules. In the encoder path, the down-sampling process is carried out with the fused Attention mechanisms of BAM, SE, ULSAM, and CC. These fused mechanisms are arranged in a distributed manner, and following this, the up-sampling phase is employed. The context is captured through the utilization of a contacting path, which is presented as Zenc encoder, the expansive path that is useful for precise localization is known as the decoder Zdec. The encoder part of the segmentation model takes the input as H having the dimension of dimension (N,512,512,1). Moreover, the encoder module is composed of 2 × 2 convolutional layers followed by a dropout layer and a max pooling layer, such that the dimension at each stage of the down-sampling layer in the encoder phase increases. On the contrary, the dimension is reduced at each layer of the up-sampling layer that lies in the encoder module. The OFA-based DW-Net architecture is depicted in Figure 2.

OFA-based DW-Net model architecture.
The resultant output obtained at the final dropout layer of the down (N,64,64,512)-sampling process at the encoder side is represented as R, having the dimension of [N,32,32,1024], which is further passed to the fused attention framework in a distributed manner, and the concatenated output derived through the attention model has the size of [N,32,32,4096], respectively. In the up-sampling process, with the transposed H(N,512,512,1) 2D convolutional layer, the feature maps are expanded and combined with the equivalent sizes of the feature map in the down-sampling process by a layer called the skip layers. The skip layers are utilized in up-sampled feature maps for improving the region localization. A SoftMax layer is applied after the final convolutional layer in the encoder path to minimize the normalized cut (Ncut) value.
The Ncut loss function is used to eliminate the segmentation noise and smooth the layer. Therefore, the Ncut value is calculated by a mathematical expression as,
BAM [21] is widely used with the convolutional network model because of its simple and efficient size. The major purpose of this attention is to increase the representation capability of the model by considering spatial and channel attention. The Channel attention and the spatial attention are the two different branches exist in the channels of the feature map. The input feature is taken as R ∈ ℝc×h×w, and a 3D attention map for BAM is noted as A(R) ∈ ℝc×h×w. Figure 3 represents the structure of BAM.

Bottleneck attention mechanism.
In the channel attention branch, to calculate a feature map, average pooling layer is taken, which encodes the information present in each channel. For determining the attention, a multilayer perceptron is used in a hidden layer, and a batch normalization layer is also used for adjusting scale parameters. Hence, the channel attention is mathematically expressed as:
Spatial attention is employed to filter out features distributed across different spatial positions. This module utilizes two key hyperparameters: the dilation rate, which helps in capturing contextual information, and the reduction ratio, which manages channel capacity and controls computational overhead. The feature is first reduced using (1× 1) convolution, and then (3 × 3) dilated convolutions are used, and finally, the reduced features are again convoluted using (1× 1) batch normalization in the spatial branch output for scale adjustment. Therefore, spatial attention is represented as,
The sigmoid function is denoted by σ, spatial attention is expressed by As(R), and channel attention is denoted by Ac(R). Thus, the BAM output R1 is calculated by adding the original input feature with the attention module, and it is mathematically expressed as,
The working principle of SE attention [22] depends on the convolution network model, and the spatial dependencies are figured out by global average pooling. The SE method in image segmentation provides more pixel-wise information about the spatial blocks, which is more accurate. Assume the input of SE as R = {r1, r2 ,..., rn} with the number of channels, such that rk = ℝh×w, where spatial height and spatial height are indicated as h and w, respectively. The working mechanism of SE is based on two different frameworks, namely the spatial framework and channel framework. Hence, the spatial model employs global average pooling as a squeeze operation. This generates a vector by summarizing the spatial information of each feature map q, and it is represented by:
Furthermore, the vector q is processed through two fully connected (FC) layers with the ReLU operator. Finally, the resultant factor of the FC layer is given to the sigmoid layer to generate the output
The channel model performs channel squeeze operations using a convolutional layer, its output is denoted as c, and it can be represented by c = R × wt, where wt denotes the weight factor, and c implies the output of the projection sensor. Each projection tensor is further fed to the sigmoid layer to retrieve the output as σ(c), and the channel model output is represented by:
Also, the irrelevant features are reduced, and important spatial locations are delivered by the recalibration. The spatial, channel, squeeze, and excitation components are integrated into a single block by concatenating the outputs from both channels.
This combined network of the feature map provides both channel-wise and spatial information, segments the important features, reduces the complexity problems, and the resultant output will be denoted as R2. Figure 4 represents the SE structure.

SE attention. SE, squeeze and excitation.
The structure of ULSAM [23] is designed by considering a convolution and pooling layer such that it learns an attention map in each feature subspace, which minimizes the channel and spatial redundancy. Since the attention map was learned by the model, it enhances the image segmentation and efficiently learns the cross-channel feature map interdependencies. Let us consider a feature map R ∈ ℜb×h×w from the dropout layer, b be the number of input channels, and h w represents the dimension of the spatial map. The major goal is to capture features of cross-channel interdependencies effectively in the feature map. The feature map R is divided into d groups, and each group has a feature map D [R1,R2 ,...,Rn ,...,Rd]. In ULSAM, the features are extracted through a max pooling layer, a depthwise convolution layer, and a point-wise convolution layer with a feature map Rn. The structure of ULSAM is depicted in Figure 5.

Structure of ULSAM.
An attention map Attm captures each group’s nonlinear dependencies to capture the cross-channel information in a feature map. To verify weighing attention, a gating function is utilized with the SoftMax activation function and refines the feature map. The output R3 obtained in the feature map is obtained by joining the feature maps of every group.
The CC attention [24] model is mainly used to capture the contextual information from the images in an optimal way. Because of its lightweight computation and more memory capability to define the local feature representations. This attention mechanism collects the vertical and horizontal contextual information to improve the pixel-wise explanation. The structure of CC attention is presented in Figure 6.

Structural diagram of CC. CC, criss-cross.
Consider a local feature map R ∈ ℝc×w×h, which is fed into convolution layers to generate the feature maps as, J, L, and X. {J, L} ∈ ℝc′×w×h, c′ denotes the number of channels that exist. An attention map is generated after acquiring the feature maps J, L through an affinity operation. At each position τ in the spatial dimension, feature vectors generated from J and L are indicated as Jτ and Lτ, respectively. Hence, the affinity operation is expressed as Aτ = Jτ Lτ, which refers to the degree of correlation among feature vectors Jτ and Lτ. Furthermore, a SoftMax operation is applied to Aτ compute the attention map AM. Moreover, a convolution layer with [1× 1] a filter is applied over the input feature to create X for feature adaptation process, and the respective feature vector is denoted as Xτ.
The MSMSD-based feature extraction strategy incorporates multiple feature types, including statistical deep flow features, GLCM-based features, and 3D structural pattern features. This approach helps minimize computational complexity and enhances the model’s accuracy.
Hybrid structural pattern features encompass a range of descriptors, including local binary pattern (LBP), local ternary pattern (LTP), and local gradient pattern (LGP).
The effective features from the images are acquired using the statistical approach known as LBP [25]. Therefore, it considers only the local features of the ROI extracted images, and it is mathematically expressed as:
β represents the extracted hybrid 3D structural pattern-based features with dimension [N × 120 × 120 × 3].
Statistical deep flow-based features are used to isolate the ROI from the extracted image, allowing the generation of a deep flow feature map. The statistical features [27] improve the performance of classification accuracy and make the model more reliable. The statistical deep flow feature uses the “cv2.calcOpticalFlowPyrLK” package to acquire the deep flow feature, and it is mathematically denoted as φi. As a result, the statistical features derived include mean, standard deviation, variance, skewness, and kurtosis. Table 1 demonstrates the characteristics obtained from the deep flow feature representation.
Features based on statistical deep flow
| Features | Description | Mathematical notation | Output size |
|---|---|---|---|
| Mean | It is the ratio between the total intensity of all pixels and the total number of pixels within the deep flow feature image. |
| [N,120,120,1] |
| where φi is the feature and r the total images. | |||
| Kurtosis | Kurtosis is defined as the shape of the selected images taken for the statistical measurement. |
| [N,120,120,1] |
| T2 is the kurtosis | |||
| Standard deviation | The standard deviation is defined as the square root of the variance and represents the average deviation of each pixel intensity from the mean. |
| [N,120,120,1] |
| standard deviation is denoted as T3 | |||
| Skew | Skewness is defined as a measure of symmetry of the image. |
| [N,120,120,1] |
| T4 Is the skew | |||
| Variance | Variance is defined as the square of the standard deviation. |
| [N,120,120,1] |
| T5 Is the variance |
Eq. (25) demonstrates the statistical deep-flow-based features with the dimensionality [N,120,120,5].
GLCM [28] method is preferred because of its accuracy, and the computing time for extracting the images is shorter. It consists of entropy, energy, contrast, homogeneity, and dissimilarity. In Table 2, the features of GLCM are tabulated below.
Features based on GLCM
| Features | Overview | Formula | Dimensions of outputs |
|---|---|---|---|
| Homogeneity | Homogeneity calculates the similarity of the texture in the distributed gray-level object pairs. |
| [N,120,120,1] |
| homogeneity feature is represented as E3. | |||
| Energy | Energy is used to calculate the uniformity of an image. |
| [N,120,120,1] |
| E1 It is the energy feature, and GLCM of the image Q is denoted as Mkl. | |||
| Entropy | Entropy reproduces the complexity of an image present in the GLCM features. |
| [N,120,120,1] |
| E4 Represents the entropy. | |||
| Contrast | Contrast calculates the local variation amounts present in the image. |
| [N,120,120,1] |
| E5 Depicts the contrast. | |||
| Dissimilarity | It measures the gaps between the mean variances and ROI in the gray-scale image |
| [N,120,120,1] |
| E2 Denotes the dissimilarity feature of GLCM. |
ROI, region of interest.
The equation below represents the extracted GLCM features:
The extracted MSMSD-based features are specified as U with dimension [N,120,120,5]. These are transferred to the distributed tetra head attention-based convolutional bidirectional network (DTCBiNet) model to classify the BTs.
The DTCBiNet model is the integration of a convolutional and BiLSTM network, and it is used for the classification of BT. The DTCBiNet model comprises different layers, such as an input layer, a convolutional layer, a Leaky ReLU layer, a max pooling layer, a Dropout layer, a Reshape layer, an attention mechanism layer, a concatenation layer, a BiLSTM layer, and a Dense layer, respectively. This DTCBiNet model helps to extract the complex patterns in the feature and boosts the performance of the model with better classification accuracy. Along with this, the DTCBiNet model extracts both the spatial and the temporal relationships of the features and enhances the model’s performance. The introduction of an attention mechanism in the model mitigates redundancy factors and effectively maximizes the learning process.
In this DTCBiNet model, the initially extracted image U is passed as input to the model, with [N,120,120,13] dimension, the image fed to the convolutional 2D layer. This layer extracts more features, and it acts as the main component in the DTCBiNet model. In the first convolutional layer, low-level features are first extracted, and high-level features are transferred to further convolutional layers. The mathematical equation of a convolutional layer is represented by,

Architecture of DTCBiNet. DTCBiNet, distributed tetra head attention-based convolutional bidirectional network.
The SEnO algorithm is obtained by integrating the bat optimization [29], the echolocation-based Dolphin optimization [30] algorithm, and the Sparrow Search Algorithm (SSA) [31]. The SEnO algorithm is employed to optimize the bias and weight of the DTCBiNet model by applying the optimization at the end of the BiLSTM layer. On the contrary, for training the segmentation approach, on the fifth layer, the optimization is integrated to improve the performance of the model.
For finding prey and to navigate by listening to the echoes in the surroundings, the bat uses the echolocation behavior. In dark situations, it avoids the hindrance problem. Likewise, the dolphin utilizes echolocation by generating the sound, which is in the form of clicks, to locate the obstacles and its prey. The SEnO algorithm rapidly improves the model performance and reduces the overfitting challenges. The development of the SEnO algorithm in the model executes with better escaping, grouping, and alarming strategies and enhances the overall model mechanisms.
In the initialization phase, first consider the population with a number of solutions h, and the total number of solutions t. Algorithm’s initial population is mathematically expressed as:
Fitness function of the SEnO algorithm is computed depending on the accuracy measure, such that the model’s performance is observed by attaining maximum accuracy, which is represented by:
The SEnO algorithm moves toward the optimal solution for obtaining a better solution with two decision factors, and it can be expressed as:
Case (i): Exploration Phase: When
Whenever the random variable ranges among [0,1], this criterion occurs, the exploration phase will be implemented.\left( {Ra < V_i^t} \right) Subcase (i): Wide search based on Velocity: If (S < 0.5), when the random values fall in the middle of [0,1] and <0.5, this case is considered. The wide region searching capability is improved to the extended global search and making the solution wider to find the best optimal solution. This uses a velocity factor to speed up the exploration phase to find the best regions correspondingly. Therefore, the characteristics of a positional update can be expressed as:
(38) where the previous iteration solution’s position is denoted asF_i^{\left( {t + 1} \right)} = ve_i^{\left( {t + 1} \right)} + F_i^{\left( t \right)} , the current iteration solution position is denoted asF_i^{\left( {t + 1} \right)} and the wide search parameter is denoted by:F_i^t (39) where κ is the constant parameter, and the velocity parameter can be expressed as:F_{ig}^t = F_{i,q}^t\exp \left( {{{ - i} \over {\kappa \cdot t\max }}} \right) (40) where fr represents the frequency parameter, and it can be expressed as:ve_i^{\left( {t + 1} \right)} = ve_i^{\left( t \right)} + \left( {F_i^{\left( {t + 1} \right)} - F_{best}^{\left( t \right)}} \right) \cdot fr (41) where ρ ∈ (0,1) it is a random vector which is taken from the uniform distribution. frmin,frmax Is the minimum and maximum frequency of pulse emissions. Subcase (ii): Random Search: If Ra ≥ 0.5 Here, search variables fall among the values (0, 1), where it is higher than or the same value as 0.5, then this case will be implemented. The updated position will be mathematically expressed as:fr = f{r_{\min }} + \left( {f{r_{\max }} - f{r_{\min }}} \right)\rho (42) where Dis refers to the distance between the target and solution, and w denotes the iteration period. Therefore, the iterations and the distance are represented as:F_i^{\left( {t + 1} \right)} = F_{rand}^{\left( t \right)} - Dis \cdot w (43) Dis = \left| {2\Phi \cdot F_{rand}^{\left( t \right)} - F_i^{\left( t \right)}} \right| (44) where the random integer with the value of [0,1] is expressed as Φ, and the maximum number of iterations and the current iteration are represented as y, ymax. Furthermore, the updated new position equation is denoted by:w = 2\left( {1 - {y \over {{y_{\max }}}}} \right) (45) where κ act as a hybridization factor, which is in the range of (0, 1), Re is the alternative index. By the above Eq. (43), the positions are updated, and it traverses the current search space, which improves the chance of getting Persistent local optima are reduced.F_i^{t + 1} = \kappa \left[ {{1 \over {{\mathop{\rm Re}\nolimits} }}*\left( {{\mathop{\rm Re}\nolimits} \; - \left| z \right|} \right)Fit\left( {{F_i}} \right) + F_i^{\left( t \right)}} \right] + \left( {1 - \kappa } \right)\left[ {F_{rand}^{\left( t \right)} \cdot Dis\;w} \right] explores the new search space, and the chances of getting stuck in the local optima are minimized.
Case (ii)-Exploitation Phase: If
, this phase will be introduced where the randomly created value is superior to or equal to pulse emissions. The updation of the solution depends on the average loudness and the randomness of the solution. The slower convergence rate and the exploitation phase issues are overcome by generating a new solution and which is represented by:\left( {Pa \ge F_i^t} \right) (46) where Avg(t) implies the average of all learning rate parameters, ∂ denotes the hybridization parameter in the range of [0,1], ϖ denotes the random integer in between the value of [−1, 1],\matrix{ {F_i^{\left( {t + 1} \right)} = \partial \left[ {F_i^{\left( t \right)} + \varpi \cdot Av{g^{\left( t \right)}}} \right]} \hfill \cr {\;\;\;\;\;\;\;\;\;\; + \;\left( {1 - \partial } \right)\left[ {\left| {F_{opt}^{\left( t \right)} - F_i^{\left( t \right)}} \right| \cdot {{\cos}}\left( {2\pi {R_5}} \right) + F_{opt}^{\left( t \right)}} \right]} \hfill \cr } is the current iteration optimal solution, and R5 denotes the arbitrary integer with the value of −1 and 1. Eq. (44) can be updated through the quick movement by considering the random variable that is greater than the threshold value, which enables the solution to quickly reach the optimal region of the exploitation phase. Therefore, the updated solution can be mathematically represented by,F_{opt}^{\left( t \right)} (47) where A denotes a random number that obeys the normal distribution, and B represents a matrix, and each element inside is 1. Therefore, the introduction of the quick movement factor within the exploitation phase enables the solution to find a better optimal region. Hence, the SEnO algorithm finds the best solution by searching and exploring the location. Thus, the computational complexity and the best solution delivered within the minimum time by the algorithm.\matrix{ {F_i^{\left( {t + 1} \right)} = F_i^t + {\rm{A}} \cdot {\rm{B}} + \partial \left[ {F_i^t + \varpi Av{g^{\left( t \right)}}} \right]} \hfill \cr {\;\;\;\;\;\;\;\;\;\; + \;\left( {1 - \partial } \right)\left[ {\left| {F_{opt}^{\left( t \right)} - F_i^{\left( t \right)}} \right|} \right] \cdot \cos \left( {2\pi {R_5}} \right) + F_{opt}^{\left( t \right)}} \hfill \cr }
The SEnO algorithm halts execution once a predefined condition is met, indicating convergence, and it then returns the most optimal solution. The flowchart outlining the SEnO algorithm’s steps is shown in Figure 8.

The flowchart of the SEnO algorithm.
This section describes the results obtained by the SEnO-DTCBiNet model, the evaluation metrics utilized to evaluate the SEnO-DTCBiNet model, the dataset description, and the experimental setup.
The SEnO-DTCBiNet framework was developed using Python 3.7 on a Windows 11 platform. The system setup includes a 13th-generation Intel Core i7-13770K processor, 16 GB of RAM, and an Nvidia GeForce RTX 3080 Ti GPU with 12 GB of dedicated memory. Development was carried out using the PyCharm Community Edition, with storage handled by a 128 GB ROM. The hyperparameters used by the SEnO-DTCBiNet model for classification are displayed in Table 3.
Hyperparameters of the SEnO-DTCBiNet model
| Hyperparameters | Values |
|---|---|
| Kernel size | (3 × 3) |
| Pooling | (2,2) |
| Convolution 2D layers | 2 |
| Activation function | ReLU |
| Learning rate | 0.02 |
| Dropout rate | 0.5 |
| No. of BiLSTM layers | 2 |
| LSTM Units | 64 |
| Loss function | Categorical-cross entropy |
| Optimizer | Adam |
| Number of epochs | 100 |
| Pooling | MaxPooling2D |
| Metrics | Accuracy |
| Padding | Same |
| Stride size | 2 |
SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network.
Datasets included in the research are described below:
BT dataset [17]: It consists of three different kinds of BT with 3064 images. Moreover, the collected images are split into four subgroups, and they are stored in 4.zip files, respectively.
BraTS 2020 dataset [18]: It comprises MR images from 369 patients, and four types of MR images were gathered from each patient for the training set. There are four kinds of images present in this dataset.
Moreover, the validation set consists of MR images of 125 patients, and these images are collected from the various 19 institutes with several clinical protocols.
BraTS 2018 dataset [19], the training set comprises 285 patients; among them, 210 patients have high-grade glioma and 75 patients have low-grade glioma. Likewise, the validation set includes 66 patients with unknown-grade BTs. The dataset visualization is illustrated in Figure 9.

Dataset visualization. (A) BraTS 2020 dataset, (B) BT dataset, and (C) BraTS 2018 dataset. BT, brain tumor.
Precision: It measures the percentage of the SEnO-DTCBiNet model’s all positive classifications, and it is denoted as:
Recall: It measures the percentage of all positives, which are classified correctly by the SEnO-DTCBiNet model as positive, and it is represented as:
Accuracy: It measures the SEnO-DTCBiNet model’s overall correctness in classifying data, and it is represented as:
Fi-score: It estimates the performance of the SEnO-DTCBiNet model by the harmonic mean of recall and precision, which is denoted as:
The image results of the SEnO-DTCBiNet model across the BraTS 2018 dataset, BraTS 2020 dataset, and the BT dataset are represented in Figures 10–12, respectively.

Image results based on the BraTS 2020 dataset. ROI, region of interest.

BT dataset-based image results. BT, brain tumor; ROI, region of interest.

BraTS 2018 dataset-based image results. ROI, region of interest.
The model performance across Brats2020, BT dataset, and the Brats2018 for different amounts of training data and epochs is demonstrated in this section.
Figure 13 represents the SEnO-DTCBiNet model performance on the BraTS 2020 dataset, assessed using training data. The precision value of the SEnO-DTCBiNet model at epoch 100 with the training percentage of 50%, 70% and 90% is 94.21%, 95.87%, and 97.30%, respectively. The SEnO-DTCBiNet model’s recall value at epoch 60 is 94.45% and at epoch 100 is 97.88%. Likewise, the accuracy and F1-score value achieved by the SEnO-DTCBiNet model at epochs 20 to 100 with 90% of training data are 88.59%, 90.00%, 92.49%, 93.59%, 97.17% and 87.06%, 87.36%, 88.03%, 88.27% and 97.25%, respectively.

SEnO-DTCBiNet model performance on the BraTS 2020 dataset. SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network.
Figure 14 represents the proposed SEnO-DTCBiNet model’s effectiveness utilizing the BT dataset. Precision value obtained by the SEnO-DTCBiNet model at epochs 100, 60, and 20 for 90% of the training data is 96.86%, 92.58%, and 91.10%, respectively. Similarly, the recall of the SEnO-DTCBiNet model at epoch 40 is 92.31% and then it increases to 93.84% at epoch 80, and lastly it reaches 98.72% at epoch 100, respectively. The SEnO-DTCBiNet model’s accuracy at epochs 20–100 is 92.53%, 93.15%, 94.33%, 95.50%, and 98.48%. The SEnO-DTCBiNet model achieved an F1-score of 96.54%, 97.78%, and 97.78% at epoch 100, corresponding to training percentages of 60%, 80%, and 90%, respectively.

Performance of the SEnO-DTCBiNet model using the BT dataset. BT, brain tumor. SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network.
The SEnO-DTCBiNet model performance is evaluated under varying training percentages is presented in Figure 15. The precision of the SEnO-DTCBiNet model at epochs 20–100 is 87.45%, 91.38%, 93.50%, 93.75%, and 94.16% for the training percentage of 90%. Similarly, recall and accuracy values obtained by the SEnO-DTCBiNet model at epochs 60 and 100 with 90% of training data are 90.46%, 97.42% and 92.29% and 97.76%, respectively. The SEnO-DTCBiNet model F1-score at epoch 20 is 86.18% and then it rises to 89.83% at epoch 60, and finally it reaches 95.76%, respectively.

Performance of SEnO-DTCBiNet model using the BraTS 2018 dataset. SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network.
The comparative approaches are CJHBA-DRN [4], EDN-SVM [11], 2D-CNN-CAE [15], PatchResNet [7], Bat-Algorithm-based multihead convolutional bidirectional memory (BA-MCBM) [29], Dolphin swarm algorithm-based multihead bidirectional convolutional memory (DE-MCBM) [30], multihead sonar prey optimized convolutional bidirectional memory (SPO-MCBM), and SSA-based distributed tetra head attention-based convolutional bidirectional network (SSA-DTCBiNet) [31].
The SEnO-DTCBiNet model performance against several traditional approaches on the BraTS 2020 dataset is presented in Figure 16. The precision of the SEnO-DTCBiNet model is 97.30%, which shows that it is higher than the EDN-SVM by 12.31%, 2D-CNN-CAE by 8.16%, CJHBA-DRN by 10.25%, BA-MCBM by 7.28%, Dense-CNN by 14.27%, PatchResNet by 9.66%, DE-MCBM by 6.36%, SSA-DTCBiNet by 6.82%, and SPO-MCBM by 1.86% for 90% of the training data. The SEnO-DTCBiNet model achieved a recall value of 97.88%, which indicates an enhancement by 5.06%, 2.45%, 4.61%, 1.85%, 6.11%, 4.24%, 1.68%, 1.76%, and 0.11 for the EDN-SVM, 2D-CNN-CAE, CJHBA-DRN, BA-MCBM, Dense-CNN, PatchResNet, DE-MCBM, SSA-DTCBiNet, and SPO-MCBM, respectively. The accuracy of the SEnO-DTCBiNet model is 97.17%, and it is improved by 10.05% with EDN-SVM, 6.63% with 2D-CNN-CAE, 8.59% with CJHBA-DRN, 6.10% with BA-MCBM, 11.29% with Dense-CNN, 7.49% with PatchResNet, 3.59% with DE-MCBM, 4.85% with SSA-DTCBiNet, and 0.65 with SPO-MCBM for 90% of the training data. Likewise, the SEnO-DTCBiNet model attained an F1-score of 97.25%, which is an improved value over other comparative approaches.

SEnO-DTCBiNet model performance comparison on the BraTS 2020 dataset. SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network.
The SEnO-DTCBiNet model performance against the other conventional models based on K-fold l using the BraTS dataset is shown in Figure 17. K-fold cross-validation is performed for KF-4 to KF-10 against the baseline models, with the proposed SEnO-DTCBiNet. The SEnO-DTCBiNet model achieved 96.27% of accuracy, which is an improved value than Dense-CNN and EDN-SVM, by 8.92% and 6.31% at a fold of 10. Likewise, in the precision, the proposed model attains a higher value of 95.53%, showing improvement over 5.99% by CJHBA-DRN and 3.44% by PatchResNet. The recall value achieved by the SEnO-DTCBiNet framework is 97.74% showing an enhancement of 3.69% over 2D-CNN-CAE and 2.52% over BA-MCBM, respectively. The F1-score of the SEnO-DTCBiNet model is 96.62%, which is greater than DE-MCBM by 1.18%, SSA-DTCBiNet by 0.32%.

K-fold analysis on the BraTS 2020 dataset.
The comparative results of the SEnO-DTCBiNet model against the baseline model based concerning training Percentage on the BT dataset is presented in Figure 18. The SEnO-DTCBiNet model obtained a precision of 96.86%, which is also higher than other comparative EDN-SVM, 2D-CNN-CAE, CJHBA-DRN, BA-MCBM, Dense-CNN, PatchResNet, DE-MCBM, SSA-DTCBiNet, and SPO-MCBM approaches. The recall of the SEnO-DTCBiNet model is 98.72%, and it is improved over 6.29% with EDN-SVM, 4.03% with 2D-CNN-CAE, 5.83% with CJHBA-DRN, 2.93% with BA-MCBM, 7.06% with Dense-CNN, 5.65% with PatchResNet, 2.61% with DE-MCBM, 1.40% by SSA-DTCBiNet, and 0.20 by SPO-MCBM. The SEnO-DTCBiNet model achieved an accuracy of 98.48%, which shows that it is higher than EDN-SVM by 9.57%, 2D-CNN-CAE by 6.80%, CJHBA-DRN by 7.46%, BA-MCBM by 6.76%, Dense-CNN by 10.06%, PatchResNet by 7.24%, DE-MCBM by 4.23%, SSA-DTCBiNet by 2.40%, and SPO-MCBM by 0.57 for 90% of the training data. The F1-score of SEnO-DTCBiNet model is 97.985, which displays the enhancement by 8.62%, 5.72%, 8.40%, 4.96%, 9.89%, 7.08%, 2.78%, 1.58%, 0.38 for the EDN-SVM, 2D-CNN-CAE, CJHBA-DRN, BA-MCBM, Dense-CNN, PatchResNet, DE-MCBM, SSA-DTCBiNet, and SPO-MCBM, respectively.

Comparison of the SEnO-DTCBiNet model based on the training percentage using the BT dataset. BT, brain tumor. SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network.
The proposed SEnO-DTCBiNet model’s performance against the other traditional models, based on the K-fold using the BT dataset, is depicted in Figure 19. The proposed SEnO-DTCBiNet model achieved 96.23% of accuracy, showing improved performance over Dense-CNN by 7.18% and EDN-SVM by 4.92%. Additionally, its precision reached a high value of 95.49%, outperforming CJHBA-DRN by 4.65% and PatchResNet by 4.31%. The SEnO-DTCBiNet framework also demonstrated a superior recall value of 97.71% representing an enhancement of 0.66% and 0.33% over 2D-CNN-CAE and BA-MCBM. Furthermore, the model’s F1-score reached 0.81% at fold-10, exceeding DE-MCBM by 1.06% and SSA-DTCBiNet by 0.81%. The proposed SEnO-DTCBiNet framework significantly advances BT segmentation by achieving improved performance. This makes it a more effective tool for analysis compared to the conventional methods.

K-fold analysis on the BT dataset. BT, brain tumor.
Figure 20 presents the performance comparison of the proposed SEnO-DTCBiNet model which attained of 94.16%, which shows a development by 4.47%, 1.48%, 4.32%, 1.15%, 5.12%, 2.66%, 1.08%, 0.70, and 0.28 for the EDN-SVM, 2D-CNN-CAE, CJHBA-DRN, BA-MCBM, Dense-CNN, PatchResNet, DE-MCBM, SSA-DTCBiNet, and SPO-MCBM. The SEnO-DTCBiNet model recall and accuracy with 90% of training data are 97.42% and 97.76% respectively. The F1-score of SEnO-DTCBiNet model is 95.76%, which is greater than the EDN-SVM by 5.73%, 2D-CNN-CAE by 3.00%, CJHBA-DRN by 4.57%, BA-MCBM by 2.29%, Dense-CNN by 6.29%, PatchResNet by 3.68%, DE-MCBM by 1.90%, SSA-DTCBiNet by 1.35%, and SPO-MCBM by 0.34.

Comparison of the SEnO-DTCBiNet model based on training percentage using the BraTS 2018 dataset.
The model evaluation on the K-fold cross-validation against the baseline models across the BraTS 218 dataset is depicted in Figure 21. Contrasting it with conventional approaches. With a K-fold 10, the SEnO-DTCBiNet model achieved an accuracy of 95.51%, exceeding that of Dense-CNN and EDN-SVM by 6.70% and 5.96% respectively. The precision also registered a higher value of 95.37%, showing an improvement of 3.68% over CJHBA-DRN and 2.45% over PatchResNet. The SEnO-DTCBiNet framework’s recall reached 95.79%, outperforming 2D-CNN-CAE by 3.77% and BA-MCBM by 2.75%. Finally, at fold-10, the model secured an F21-score of 96.62%, which is 1.18% greater than DE-MCBM and 0.32% greater than SSA-DTCBiNet. By consistently beating multiple baseline models across different K-folds, the SEnO-DTCBiNet model demonstrates superior performance and robustness.

K-fold based on the BraTS 2018 dataset.
The SEnO-DTCBiNet model performance evaluation across the other baseline models using the real-time dataset is presented in Figure 22. The SEnO-DTCBiNet model achieved 96.35% accuracy, performing better than the Dense-CNN by 5.40%, EDN-SVM by 4.46%, and CJHBA-DRN by 4.06% when trained on 90% of the data. The model attains a precision rate of 95.98%, showing improvements over other models such as 2D-CNN-CAE, BA-MCBM, and PatchResNet by 2.10%, 1.81% and 2.82% respectively. The SEnO-DTCBiNet model achieved a recall value of 97.10%, which is 1.25% better than DE-MCBM and 0.21% better than SSA-DTCBiNet. Additionally, the F1-score of the model is 96.54%, with a gain over SPO-MCBM of 0.38%. Overall, these results indicate that the SEnO-DTCBiNet model offers an effective performance enhancement, particularly in the context of BT classification, when compared to conventional models.

Comparison of the SEnO-DTCBiNet model on the real-time dataset concerning training percentage. SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network.
Figure 23 represents the SEnO-DTCBiNet model outcomes across the traditional methods based on the training Percentage of the real-time dataset. the SEnO-DTCBiNet model achieved an accuracy of 96.47%, surpassing Dense-CNN by 8.78%, and EDN-SVM by 6.23%, on K-fold 10. Furthermore, the model demonstrated a high precision of 95.78%, outperforming approaches such as 2D-CNN-CAE, BA-MCBM, and PatchResNet by 2.38%, 2.33% and 3.52%. In recall, the SEnO-DTCBiNet reached 97.87%, which is 0.54% higher than DE-MCBM and 0.43% above SSA-DTCBiNet. Its F1-score also shows a higher value, 96.81% presenting an improvement over SPO-MCBM by 0.80%. These results highlight the SEnO-DTCBiNet framework as a significant advancement in performance, particularly in BT classification compared to other conventional models.

K-fold analysis on the real-time dataset.
Figure 24 portrays a Receiver Operating Curve of the SEnO-DTCBiNet framework against the other comparative models, including CJHBA-DRN, BA-MCBM, 2D-CNN-CAE, Dense-CNN, EDN-SVM, PatchResNet, DE-MCBM, SSA-DTCBiNet, and SPO-MCBM methods based on three datasets. Based on this ROC analysis, it indicates that the SEnO-DTCBiNet model performs significantly to the other comparative methods.

ROC analysis. (A) BraTS 2020 dataset, (B) BT dataset, and (C) BraTS 2018 dataset. BT, brain tumor.
The Time Complexity of the SEnO-DTCBiNet model against other existing methods using the BraTS2019, BT, and the BraTS2020 datasets is depicted in Figure 25. The analysis reveals that the SEnO-DTCBiNet model requires less time for tumor classification as the number of iterations increases, outperforming the comparative approaches.

Time complexity analysis. (A) BraTS 2020 dataset, (B) BT dataset, and (C) BraTS 2018 dataset. BT, brain tumor.
The convergence analysis of the SEnO optimization compared to approaches such as RMsProp, Adam, BA, SSA, DEA, and SPO is depicted in Figure 26. The analysis indicates that SEnO’s loss value decreases more effectively than its counterparts as the number of epochs increases. At epoch 50, the SSA and DEA methods reach a loss of 6.73 × 10−07, 3.13 × 10−06 whereas the proposed optimization reaches a lower loss of 1.00 × 10−06, respectively. Similarly, at epoch 99, the proposed optimization remains the lower loss value of 1.00 × 10−06, whereas the existing approaches, RMsProp, Adam, and BA reach a loss of 0.630, 0.002, and 0.001. These findings therefore demonstrate the effectiveness of the SEnO optimization in achieving an optimal solution for classifying the BT.

Convergence analysis. SSA, Sparrow Search Algorithm.
Figure 27 presents the computational complexity of the proposed brain tumor detection model and various other methods, using FLOPS as the single arithmetic operation, such as addition or multiplication, performed on two floating-point numbers. At epoch 50, existing methods exhibit high FLOPS requirements, such as the Dense-CNN report 1.59 × 1008 and EDN-SVM report 1.45 × 1008, whereas the proposed model utilizes fewer FLOPs of 1.42 × 1008. Similarly, at epoch 99, the SSA-DTCBiNet, SPO-MCBM uses fewer FLOPs of 2.08 × 1008; in comparison, the proposed SEnO-DTCBiNet model significantly reduces the computational complexity to 2.05 × 1008 FLOPS through the SEnO algorithm, which significantly configures the model to enhance classification performance.

Computational complexity analysis in terms of FLOPS. FLOPS, floating-point operations per second.
As described in Figure 28, ablation is performed to evaluate the individual contributions of various architectural components in the proposed model. Starting with a BiLSTM, the model achieved 93.68% accuracy, effectively capturing temporal dependencies. Incorporating the SEnO optimizer with CNN shows improved performance to 93.96%, showing the benefit of optimized weight updates in the convolutional network. The CBiNet model, individually reached 94.64%, while the combination of SEnO + BiLSTM yielded 95.24%, demonstrating enhanced training stability. Adding the Tetra Head Attention to CBiNet further increases the performance to 95.74%, and SEnO + CBiNet reached 96.35% by unifying optimization and hybrid architecture. Finally, the complete SEnO-DTCBiNet model achieved an accuracy of 97.18%, validating the effectiveness of combining CNN, BiLSTM, SEnO optimization, and multihead attention in one unified framework. These results confirm that each component contributes to a layered improvement in classification, with the fully integrated model delivering the most robust performance.

Ablation study for SEnO-DTCBiNet model components. SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network.
An ablation study focused on the impact of individual attention mechanisms and their combinations is depicted in Figure 29. When used independently, BAM, SE, ULSAM, and CC achieved an accuracy of 87.57%, 85.68%, 91.55%, and 92.54% respectively, each excelling in different aspects, such as channel recalibration and spatial attention. Combining BAM + SE improved accuracy to 95.55%, and CC + BAM + SE achieved 94.55%, reflecting the complementary effect of combining spatial and channel attention. The trio of ULSAM + SE + CC achieved 96.46% of accuracy, showcasing the synergy between cross-channel, lightweight, and contextual attention mechanisms. Ultimately, the full DTHA configuration combining BAM, SE, ULSAM, and CC achieved the peak accuracy of 97.18%, proving that fusing diverse attention modules significantly boosts feature representation. These attention mechanisms, when integrated into the proposed model, enhance fine-grained tumor localization and classification, making SEnO-DTCBiNet a highly effective tool in classifying the BT.

Ablation study for attention mechanisms. CC, criss-cross; DTHA, distributed tetra head attention; SE, squeeze and excitation.
The SEnO-DTCBiNet model confusion matrix, across the BraTS 2020, BT, and the BraTS 2018 datasets is represented in Figure 30. This matrix serves to validate the model’s performance by comparing its classification against the actual labels.

Confusion matrix.
The statistical Analysis for the proposed SEnO-DTCBiNet model across the BraTS 2020 is presented in Table 4, across the BT is presented in Table 5, for the BraTS 2018 is shown in Table 6, and the real-time dataset is presented in Table 7.
BraTS 2020 dataset-based statistical analysis
| Methods/metrics | PatchRes Net | BA-MCBM | Dense-CNN | CJHBA-DRN | EDN-DRN | 2D-CNN-CAE | SPO-MCBM | SSA-DTCBiNet | DE-MCBM | SEnO-DTCBiNet | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Best | 92.58 | 93.91 | 87.68 | 90.85 | 90.19 | 93.05 | 94.89 | 95.92 | 96.16 | 96.27 |
| Mean | 87.40 | 88.59 | 84.81 | 86.49 | 85.78 | 87.96 | 89.33 | 90.24 | 90.70 | 92.34 | |
| Variance | 14.65 | 13.72 | 4.27 | 10.55 | 9.03 | 12.63 | 14.05 | 15.28 | 13.50 | 10.02 | |
| Standard deviation | 3.83 | 3.70 | 2.07 | 3.25 | 3.01 | 3.55 | 3.75 | 3.91 | 3.67 | 3.16 | |
| Precision | Best | 92.24 | 93.22 | 87.57 | 89.81 | 89.33 | 92.50 | 93.65 | 95.11 | 95.39 | 95.53 |
| Mean | 87.02 | 87.80 | 84.92 | 85.98 | 85.69 | 87.28 | 88.50 | 89.46 | 89.89 | 90.51 | |
| Variance | 13.94 | 14.22 | 4.09 | 7.56 | 7.31 | 14.04 | 12.45 | 14.85 | 13.31 | 12.81 | |
| Standard deviation | 3.73 | 3.77 | 2.02 | 2.75 | 2.70 | 3.75 | 3.53 | 3.85 | 3.65 | 3.58 | |
| Recall | Best | 93.26 | 95.28 | 87.89 | 92.94 | 91.92 | 94.14 | 97.38 | 97.55 | 97.69 | 97.75 |
| Mean | 88.17 | 90.17 | 84.59 | 87.50 | 85.94 | 89.34 | 91.00 | 91.78 | 92.31 | 95.99 | |
| Variance | 16.34 | 13.51 | 4.76 | 18.41 | 13.76 | 10.43 | 17.68 | 16.26 | 13.95 | 7.93 | |
| Standard deviation | 4.04 | 3.68 | 2.18 | 4.29 | 3.71 | 3.23 | 4.20 | 4.03 | 3.74 | 2.82 | |
| F1-score | Best | 92.75 | 94.24 | 87.73 | 91.35 | 90.61 | 93.31 | 95.48 | 96.31 | 96.53 | 96.63 |
| Mean | 87.59 | 88.96 | 84.76 | 86.73 | 85.81 | 88.29 | 89.73 | 90.61 | 91.09 | 93.16 | |
| Variance | 15.03 | 13.60 | 4.37 | 12.19 | 10.01 | 12.07 | 14.83 | 15.50 | 13.60 | 9.17 | |
| Standard deviation | 3.88 | 3.69 | 2.09 | 3.49 | 3.16 | 3.47 | 3.85 | 3.94 | 3.69 | 3.03 | |
BA-MCBM, Bat-Algorithm-based multihead convolutional bidirectional memory; CJHBA-DRN, Chronological Jaya Honey Badger Algorithm-based Deep Residual Network; CNN, convolutional neural network; DE-MCBM, Dolphin swarm algorithm-based multihead bidirectional convolutional Memory; SPO-MCBM, Sonar Prey Optimized Convolutional Bidirectional Memory; SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network; SSA-DTCBiNet, Sparrow Search Algorithm-based distributed tetra head attention-based convolutional bidirectional network.
BT dataset-based statistical analysis
| Methods/metrics | SSA-DTCBiNet | Dense-CNN | BA-MCBMM | EDN-SVM | 2D CNN-CAE | CJHBA-DRN | Patch ResNet | DE-MCBM | SPO-MCBM | SEnO-DTCBiNet | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Best | 95.24 | 89.32 | 94.84 | 91.49 | 94.71 | 92.19 | 92.91 | 94.97 | 95.63 | 96.24 |
| Mean | 90.89 | 85.44 | 89.21 | 86.21 | 88.51 | 87.35 | 87.94 | 90.36 | 91.80 | 93.10 | |
| Variance | 16.56 | 9.66 | 19.58 | 14.07 | 19.31 | 14.63 | 14.81 | 17.29 | 14.61 | 8.64 | |
| Standard deviation | 4.07 | 3.11 | 4.43 | 3.75 | 4.39 | 3.82 | 3.85 | 4.16 | 3.82 | 2.94 | |
| Precision | Best | 94.05 | 89.73 | 93.59 | 90.62 | 93.53 | 91.05 | 91.37 | 93.73 | 94.60 | 95.49 |
| Mean | 90.41 | 85.64 | 88.61 | 86.06 | 88.26 | 87.03 | 87.59 | 90.11 | 91.44 | 93.02 | |
| Variance | 12.50 | 10.48 | 14.22 | 11.83 | 14.31 | 10.83 | 9.82 | 12.12 | 11.31 | 4.85 | |
| Standard deviation | 3.54 | 3.24 | 3.77 | 3.44 | 3.78 | 3.29 | 3.13 | 3.48 | 3.36 | 2.20 | |
| Recall | Best | 97.63 | 88.49 | 97.34 | 93.23 | 97.07 | 94.49 | 96.00 | 97.47 | 97.69 | 97.72 |
| Mean | 91.86 | 85.05 | 90.41 | 86.49 | 89.00 | 88.00 | 88.64 | 90.86 | 92.53 | 93.27 | |
| Variance | 26.47 | 8.24 | 33.81 | 19.70 | 31.94 | 24.11 | 28.37 | 30.91 | 22.62 | 20.79 | |
| Standard deviation | 5.14 | 2.87 | 5.82 | 4.44 | 5.65 | 4.91 | 5.33 | 5.56 | 4.76 | 4.56 | |
| F1-score | Best | 95.81 | 89.11 | 95.43 | 91.91 | 95.27 | 92.74 | 93.63 | 95.56 | 96.12 | 96.59 |
| Mean | 91.12 | 85.34 | 89.49 | 86.27 | 88.62 | 87.50 | 88.09 | 90.47 | 91.98 | 93.12 | |
| Variance | 18.71 | 9.27 | 22.52 | 15.29 | 22.03 | 16.68 | 17.59 | 20.20 | 16.37 | 11.21 | |
| Standard deviation | 4.33 | 3.04 | 4.75 | 3.91 | 4.69 | 4.08 | 4.19 | 4.49 | 4.05 | 3.35 | |
BT, Brain tumor; CJHBA-DRN, Chronological Jaya Honey Badger Algorithm-based Deep Residual Network; CNN, convolutional neural network; DE-MCBM, Dolphin swarm algorithm-based multihead bidirectional convolutional Memory; SPO-MCBM, Sonar Prey Optimized Convolutional Bidirectional Memory; SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network; SSA-DTCBiNet, Sparrow Search Algorithm-based distributed tetra head attention-based convolutional bidirectional network.
BraTS 2018 dataset-based statistical analysis
| Methods/metrics | Dense-CNN | EDN-SVM | CJHBA-DRN | PatchRes Net | 2D-CNN-CAE | BA-MCBM | DE-MCBM | SSA-DTCBiNet | SPO-MCBM | SEnO-DTCBiNet | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Best | 89.11 | 89.81 | 91.60 | 92.62 | 93.17 | 93.75 | 94.22 | 94.59 | 94.78 | 95.51 |
| Mean | 84.85 | 85.64 | 87.00 | 87.87 | 88.95 | 89.39 | 89.94 | 90.23 | 90.96 | 92.12 | |
| Variance | 7.40 | 7.53 | 11.51 | 12.33 | 12.91 | 13.83 | 15.52 | 15.21 | 14.40 | 11.72 | |
| Standard deviation | 2.72 | 2.74 | 3.39 | 3.51 | 3.59 | 3.72 | 3.94 | 3.90 | 3.79 | 3.42 | |
| Precision | Best | 90.35 | 91.25 | 91.85 | 93.03 | 93.66 | 94.04 | 94.56 | 94.83 | 94.95 | 95.37 |
| Mean | 85.40 | 86.28 | 87.42 | 87.93 | 89.14 | 89.58 | 90.22 | 90.48 | 91.17 | 91.98 | |
| Variance | 10.32 | 10.30 | 11.23 | 12.63 | 13.35 | 13.60 | 15.39 | 14.57 | 15.85 | 14.38 | |
| Standard deviation | 3.21 | 3.21 | 3.35 | 3.55 | 3.65 | 3.69 | 3.92 | 3.82 | 3.98 | 3.79 | |
| Recall | Best | 86.64 | 86.95 | 91.10 | 91.81 | 92.18 | 93.16 | 93.54 | 94.13 | 94.43 | 95.80 |
| Mean | 83.76 | 84.37 | 86.17 | 87.73 | 88.57 | 89.02 | 89.36 | 89.73 | 90.55 | 92.40 | |
| Variance | 3.09 | 3.44 | 12.44 | 12.27 | 12.36 | 14.46 | 16.18 | 16.68 | 11.99 | 7.68 | |
| Standard deviation | 1.76 | 1.85 | 3.53 | 3.50 | 3.52 | 3.80 | 4.02 | 4.08 | 3.46 | 2.77 | |
| F1-score | Best | 88.46 | 89.05 | 91.48 | 92.42 | 92.92 | 93.60 | 94.05 | 94.48 | 94.69 | 95.58 |
| Mean | 84.57 | 85.31 | 86.79 | 87.83 | 88.85 | 89.30 | 89.79 | 90.10 | 90.86 | 92.19 | |
| Variance | 6.06 | 6.23 | 11.71 | 12.23 | 12.73 | 13.96 | 15.64 | 15.57 | 13.74 | 10.58 | |
| Standard deviation | 2.46 | 2.50 | 3.42 | 3.50 | 3.57 | 3.74 | 3.95 | 3.95 | 3.71 | 3.25 | |
BA-MCBM, Bat-Algorithm-based multihead convolutional bidirectional memory; CJHBA-DRN, Chronological Jaya Honey Badger Algorithm-based Deep Residual Network; CNN, convolutional neural network; DE-MCBM, Dolphin swarm algorithm-based multihead bidirectional convolutional Memory; SPO-MCBM, Sonar Prey Optimized Convolutional Bidirectional Memory; SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network; SSA-DTCBiNet, Sparrow Search Algorithm-based distributed tetra head attention-based convolutional bidirectional network.
Real-time dataset-based statistical analysis
| Methods/metrics | Dense-CNN | EDN-SVM | CJHBA-DRN | Patch ResNet | 2D-CNN-CAE | BA-MCBM | DE-MCBM | SSA-DTCBiNet | SPO-MCBM | SEnO-DTCBiNet | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Best | 88.00 | 90.46 | 91.23 | 92.92 | 93.66 | 94.52 | 95.38 | 95.52 | 95.58 | 96.48 |
| Mean | 84.17 | 85.45 | 86.68 | 87.59 | 88.59 | 89.34 | 90.31 | 90.90 | 91.68 | 93.11 | |
| Variance | 6.30 | 11.08 | 11.89 | 14.74 | 16.74 | 15.91 | 13.40 | 12.30 | 9.84 | 6.77 | |
| Standard deviation | 2.51 | 3.33 | 3.45 | 3.84 | 4.09 | 3.99 | 3.66 | 3.51 | 3.14 | 2.60 | |
| Precision | Best | 89.65 | 90.03 | 90.83 | 92.41 | 93.50 | 93.54 | 94.39 | 94.55 | 94.61 | 95.78 |
| Mean | 84.94 | 85.55 | 86.29 | 87.12 | 88.20 | 88.70 | 89.83 | 90.54 | 90.71 | 92.31 | |
| Variance | 9.44 | 9.69 | 10.04 | 13.57 | 17.21 | 13.70 | 10.50 | 10.22 | 9.99 | 6.81 | |
| Standard deviation | 3.07 | 3.11 | 3.17 | 3.68 | 4.15 | 3.70 | 3.24 | 3.20 | 3.16 | 2.61 | |
| Recall | Best | 84.71 | 91.33 | 92.04 | 93.96 | 93.96 | 96.47 | 97.34 | 97.45 | 97.51 | 97.87 |
| Mean | 82.62 | 85.25 | 87.47 | 88.54 | 89.36 | 90.61 | 91.25 | 91.62 | 93.62 | 94.71 | |
| Variance | 1.98 | 14.53 | 16.85 | 17.50 | 15.99 | 20.95 | 20.62 | 17.64 | 9.58 | 6.74 | |
| Standard deviation | 1.41 | 3.81 | 4.10 | 4.18 | 4.00 | 4.58 | 4.54 | 4.20 | 3.09 | 2.60 | |
| F1-score | Best | 87.73 | 90.61 | 91.35 | 92.75 | 93.31 | 94.24 | 95.48 | 96.31 | 96.53 | 96.63 |
| Mean | 84.76 | 85.81 | 86.73 | 87.59 | 88.29 | 88.96 | 89.73 | 90.61 | 91.09 | 93.16 | |
| Variance | 4.37 | 10.01 | 12.19 | 15.03 | 12.07 | 13.60 | 14.83 | 15.50 | 13.60 | 9.17 | |
| Standard deviation | 2.09 | 3.16 | 3.49 | 3.88 | 3.47 | 3.69 | 3.85 | 3.94 | 3.69 | 3.03 | |
BA-MCBM, Bat-Algorithm-based multihead convolutional bidirectional memory; CJHBA-DRN, Chronological Jaya Honey Badger Algorithm-based Deep Residual Network; CNN, convolutional neural network; DE-MCBM, Dolphin swarm algorithm-based multihead bidirectional convolutional Memory; SPO-MCBM, Sonar Prey Optimized Convolutional Bidirectional Memory; SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network; SSA-DTCBiNet, Sparrow Search Algorithm-based distributed tetra head attention-based convolutional bidirectional network.
A T-test is performed to compare the means of different methods. A p-value of <0.05 indicates a statistically significant difference between the results of the compared models. Table 8 represents the T-test analysis on the BraTS 2020 dataset, Table 9 presents the T-test analysis on the BT dataset, Table 10 depicts the BraTS 2018 dataset based on the T-test analysis, and Table 11 represents the T-test analysis results on the real-time dataset. Thus, the results ensure that the model enhanced BT classification performance across the baseline models under different datasets.
T-test analysis based on the BraTS 2020 dataset
| Methods | T-test analysis | |||||||
|---|---|---|---|---|---|---|---|---|
| Accuracy | Precision | Recall | F1-score | |||||
| p-value | T-statistic | p-value | T-statistic | p-value | T-statistic | p-value | T-statistic | |
| EDN-SVM | 0.12 | 2.20 | 0.10 | 2.39 | 0.16 | 1.87 | 0.13 | 2.11 |
| Dense-CNN | 0.09 | 2.42 | 0.09 | 2.53 | 0.12 | 2.19 | 0.10 | 2.36 |
| CJHBA-DRN | 0.11 | 2.21 | 0.11 | 2.27 | 0.12 | 2.13 | 0.12 | 2.19 |
| PatchResNet | 0.11 | 2.24 | 0.12 | 2.14 | 0.10 | 2.40 | 0.11 | 2.28 |
| BA-MCBM | 0.13 | 2.11 | 0.11 | 2.25 | 0.18 | 1.76 | 0.13 | 2.04 |
| 2D-CNN-CAE | 0.13 | 2.07 | 0.11 | 2.22 | 0.19 | 1.70 | 0.14 | 2.00 |
| DE-MCBM | 0.14 | 2.02 | 0.13 | 2.11 | 0.16 | 1.86 | 0.14 | 1.98 |
| SSA-DTCBiNet | 0.12 | 2.20 | 0.11 | 2.22 | 0.12 | 2.16 | 0.12 | 2.19 |
| SPO-MCBM | 0.11 | 2.21 | 0.12 | 2.17 | 0.11 | 2.28 | 0.11 | 2.22 |
| SEnO-DTCBiNet | 0.08 | 2.58 | 0.11 | 2.24 | 0.06 | 3.00 | 0.07 | 2.71 |
BA-MCBM, Bat-Algorithm-based multihead convolutional bidirectional memory; CJHBA-DRN, Chronological Jaya Honey Badger Algorithm-based Deep Residual Network; CNN, convolutional neural network; DE-MCBM, Dolphin swarm algorithm-based multihead bidirectional convolutional Memory; SPO-MCBM, Sonar Prey Optimized Convolutional Bidirectional Memory; SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network; SSA-DTCBiNet, Sparrow Search Algorithm-based distributed tetra head attention-based convolutional bidirectional network.
T-test Analysis based on the BT dataset
| Methods/metrics | T-test analysis | |||||||
|---|---|---|---|---|---|---|---|---|
| Precision | F1-score | Accuracy | Recall | |||||
| p-value | T-statistic | p-value | T-statistic | p-value | T-statistic | p-value | T-statistic | |
| DE-MCBM | 0.09 | 2.54 | 0.10 | 2.36 | 0.09 | 2.41 | 0.11 | 2.24 |
| EDN-SVM | 0.14 | 1.98 | 0.14 | 1.96 | 0.14 | 1.97 | 0.15 | 1.92 |
| CJHBA-DRN | 0.10 | 2.37 | 0.11 | 2.25 | 0.11 | 2.29 | 0.12 | 2.17 |
| PatchResNet | 0.09 | 2.42 | 0.11 | 2.22 | 0.11 | 2.28 | 0.13 | 2.08 |
| Dense-CNN | 0.13 | 2.05 | 0.12 | 2.20 | 0.12 | 2.15 | 0.10 | 2.35 |
| BA-MCBM | 0.10 | 2.38 | 0.11 | 2.25 | 0.11 | 2.29 | 0.12 | 2.15 |
| 2D-CNN-CAE | 0.10 | 2.31 | 0.12 | 2.16 | 0.12 | 2.20 | 0.13 | 2.04 |
| SSA-DTCBiNet | 0.08 | 2.62 | 0.08 | 2.56 | 0.08 | 2.58 | 0.09 | 2.51 |
| SPO-MCBM | 0.07 | 2.77 | 0.07 | 2.71 | 0.07 | 2.73 | 0.08 | 2.66 |
| SEnO-DTCBiNet | 0.10 | 2.31 | 0.07 | 2.70 | 0.08 | 2.62 | 0.07 | 2.83 |
BA-MCBM, Bat-Algorithm-based multihead convolutional bidirectional memory; BT, Brain tumor; CJHBA-DRN, Chronological Jaya Honey Badger Algorithm-based Deep Residual Network; CNN, convolutional neural network; DE-MCBM, Dolphin swarm algorithm-based multihead bidirectional convolutional Memory; SPO-MCBM, Sonar Prey Optimized Convolutional Bidirectional Memory; SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network; SSA-DTCBiNet, Sparrow Search Algorithm-based distributed tetra head attention-based convolutional bidirectional network.
BraTS 2018 dataset-based T-test analysis
| Methods/metrics | T-test analysis | |||||||
|---|---|---|---|---|---|---|---|---|
| Precision | Accuracy | F1-score | Recall | |||||
| p-value | T-statistic | p-value | T-statistic | p-value | T-statistic | p-value | T-statistic | |
| SSA-DTCBiNet | 0.07 | 2.73 | 0.07 | 2.72 | 0.07 | 2.72 | 0.07 | 2.69 |
| EDN-SVM | 0.13 | 2.08 | 0.13 | 2.09 | 0.13 | 2.10 | 0.13 | 2.10 |
| CJHBA-DRN | 0.10 | 2.37 | 0.11 | 2.23 | 0.12 | 2.15 | 0.15 | 1.94 |
| PatchResNet | 0.11 | 2.23 | 0.11 | 2.29 | 0.10 | 2.31 | 0.10 | 2.37 |
| Dense-CNN | 0.17 | 1.80 | 0.17 | 1.78 | 0.18 | 1.77 | 0.19 | 1.68 |
| BA-MCBM | 0.08 | 2.66 | 0.08 | 2.67 | 0.08 | 2.67 | 0.08 | 2.67 |
| DE-MCBM | 0.07 | 2.78 | 0.07 | 2.75 | 0.07 | 2.73 | 0.08 | 2.66 |
| 2D-CNN-CAE | 0.08 | 2.64 | 0.08 | 2.68 | 0.07 | 2.69 | 0.07 | 2.72 |
| SPO-MCBM | 0.07 | 2.78 | 0.07 | 2.72 | 0.08 | 2.68 | 0.08 | 2.54 |
| SEnO-DTCBiNet | 0.07 | 2.72 | 0.07 | 2.73 | 0.07 | 2.72 | 0.08 | 2.66 |
BA-MCBM, Bat-Algorithm-based multihead convolutional bidirectional memory; CJHBA-DRN, Chronological Jaya Honey Badger Algorithm-based Deep Residual Network; DE-MCBM, Dolphin swarm algorithm-based multihead bidirectional convolutional Memory; CNN, convolutional neural network; SPO-MCBM, Sonar Prey Optimized Convolutional Bidirectional Memory; SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network; SSA-DTCBiNet, Sparrow Search Algorithm-based distributed tetra head attention-based convolutional bidirectional network.
Real-time dataset-based T-test analysis
| Methods/metrics | T-test analysis | |||||||
|---|---|---|---|---|---|---|---|---|
| Precision | F1-score | Accuracy | Recall | |||||
| p-value | T-statistic | p-value | T-statistic | p-value | T-statistic | p-value | T-statistic | |
| PatchResNet | 0.10 | 2.30 | 0.10 | 2.35 | 0.10 | 2.34 | 0.10 | 2.39 |
| EDN-SVM | 0.11 | 2.25 | 0.14 | 2.03 | 0.13 | 2.10 | 0.17 | 1.83 |
| CJHBA-DRN | 0.11 | 2.29 | 0.09 | 2.51 | 0.09 | 2.45 | 0.08 | 2.63 |
| SPO-MCBM | 0.09 | 2.44 | 0.10 | 2.39 | 0.10 | 2.41 | 0.10 | 2.34 |
| 2D-CNN-CAE | 0.10 | 2.35 | 0.10 | 2.34 | 0.10 | 2.34 | 0.10 | 2.32 |
| BA-MCBM | 0.10 | 2.32 | 0.11 | 2.25 | 0.11 | 2.27 | 0.12 | 2.19 |
| DE-MCBM | 0.11 | 2.29 | 0.12 | 2.20 | 0.11 | 2.23 | 0.12 | 2.12 |
| SSA-DTCBiNet | 0.09 | 2.44 | 0.12 | 2.18 | 0.11 | 2.26 | 0.14 | 1.96 |
| Dense-CNN | 0.13 | 2.04 | 0.14 | 1.97 | 0.14 | 2.00 | 0.17 | 1.79 |
| SEnO-DTCBiNet | 0.12 | 2.13 | 0.12 | 2.17 | 0.12 | 2.16 | 0.11 | 2.22 |
BA-MCBM, Bat-Algorithm-based multihead convolutional bidirectional memory; CJHBA-DRN, Chronological Jaya Honey Badger Algorithm-based Deep Residual Network; CNN, convolutional neural network; DE-MCBM, Dolphin swarm algorithm-based multihead bidirectional convolutional Memory; SPO-MCBM, Sonar Prey Optimized Convolutional Bidirectional Memory; SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network; SSA-DTCBiNet, Sparrow Search Algorithm-based distributed tetra head attention-based convolutional bidirectional network.
The increase in mortality rate due to BT has led to the development of an effective system that detects and classifies the type of BT based on their origin. However, conventional approaches used in the context provide promising outcomes; their performance is hindered by increased complexities and abnormal performance. The Dense-CNN [1], due to its sophisticated architecture, leads to increased processing time and inherited feature detection capability. While the EDN-SVM [11] improvised by utilizing a machine-learning architecture leveraging its simple structure, the inability to handle structural patterns caused overfitting and computational inconsistencies. The CJHBA-DRN [4] limited its application to recognizing diverse features that are essential for discriminating among the complex brain structures, thus leading to misclassification or diminished performance. The PatchResNet [7] involved different layered patches along with the pretrained models to detect BT; however, the approach lacked optimization of the learning parameters, which led to reduced classification accuracy. The 2D-CNN-CAE [15] leads to overfitting and underfitting issues due to the single-domain data for classification. The BA-MCBM [29] and DE-MCBM [30], due to the individual characteristics of optimization, lead to premature convergence or getting stuck in local optima. The above-mentioned drawbacks are effectively tackled by the proposed SEnO-DTCBiNet by combining the OFA-based DW-Net for precise segmentation of tumor regions, deep BiLSTM layers for contextual understanding of complex structures. Additionally, the SEnO algorithm is used for hyperparameter tuning. These strategies allow the proposed SEnO-DTCBiNet to obtain improved results in classification. The comparison performance of the SEnO-DTCBiNet with other approaches based on Training Percentage and K-fold under the BraTS 2018, BraTS 2020 dataset, BT dataset, and real-time dataset is depicted in Tables 12 and 13.
Results of the SEnO-DTCBiNet and baseline models based on training percentage
| Methods/metrics | DE-MCBM | Patch ResNet | EDN-SVM | Dense-CNN | 2D-CNN-CAE | CJHBA-DRN | BA-MCBM | SSA-DTCBiNet | SPO-MCBM | SEnO-DTCBiNet | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| BraTS 2020 dataset | Precision (%) | 91.11 | 87.90 | 85.33 | 83.42 | 89.36 | 87.33 | 90.22 | 90.67 | 95.49 | 97.30 |
| Recall (%) | 96.24 | 93.73 | 92.92 | 91.90 | 95.48 | 93.37 | 96.07 | 96.15 | 97.77 | 97.88 | |
| Accuracy (%) | 93.67 | 89.89 | 87.40 | 86.19 | 90.72 | 88.81 | 91.24 | 92.46 | 96.53 | 97.17 | |
| F1-Score (%) | 93.61 | 90.72 | 88.96 | 87.45 | 92.32 | 90.25 | 93.05 | 93.33 | 96.62 | 97.25 | |
| BT dataset | Precision (%) | 94.39 | 89.04 | 86.73 | 85.09 | 90.12 | 86.74 | 90.57 | 95.55 | 96.71 | 96.86 |
| Recall (%) | 96.15 | 93.14 | 92.51 | 91.74 | 94.74 | 92.97 | 95.83 | 97.33 | 98.52 | 98.72 | |
| Accuracy (%) | 94.30 | 91.35 | 89.05 | 88.57 | 91.78 | 91.13 | 91.82 | 96.11 | 97.91 | 98.48 | |
| F1-Score (%) | 95.263 | 91.05 | 89.53 | 88.29 | 92.37 | 89.75 | 93.128 | 96.43 | 97.61 | 97.98 | |
| BraTS 2018 dataset | Precision (%) | 93.14 | 91.65 | 89.95 | 89.33 | 92.76 | 90.09 | 93.08 | 93.49 | 93.89 | 94.16 |
| Recall (%) | 94.75 | 92.82 | 90.60 | 90.15 | 93.01 | 92.72 | 94.06 | 95.45 | 97.03 | 97.42 | |
| Accuracy (%) | 95.59 | 94.05 | 89.91 | 88.16 | 94.13 | 91.21 | 94.30 | 95.69 | 96.80 | 97.76 | |
| F1-Score (%) | 93.94 | 92.23 | 90.27 | 89.74 | 92.88 | 91.39 | 93.56 | 94.46 | 95.43 | 95.76 | |
| Real-time dataset | Precision (%) | 94.80 | 93.27 | 91.62 | 90.45 | 93.96 | 91.73 | 94.24 | 95.29 | 95.31 | 95.99 |
| Recall (%) | 95.88 | 94.57 | 92.93 | 92.56 | 94.88 | 93.85 | 95.13 | 96.89 | 97.03 | 97.10 | |
| Accuracy (%) | 95.16 | 93.70 | 92.06 | 91.15 | 94.27 | 92.44 | 94.54 | 95.82 | 95.88 | 96.36 | |
| F1-Score (%) | 95.34 | 93.92 | 92.27 | 91.49 | 94.42 | 92.78 | 94.68 | 96.08 | 96.16 | 96.54 | |
BA-MCBM, Bat-Algorithm-based multihead convolutional bidirectional memory; BT, Brain tumor; CJHBA-DRN, Chronological Jaya Honey Badger Algorithm-based Deep Residual Network; CNN, convolutional neural network; DE-MCBM, Dolphin swarm algorithm-based multihead bidirectional convolutional Memory; SPO-MCBM, Sonar Prey Optimized Convolutional Bidirectional Memory; SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network; SSA-DTCBiNet, Sparrow Search Algorithm-based distributed tetra head attention-based convolutional bidirectional network.
Results of the SEnO-DTCBiNet model and comparative methods based on K-fold
| Methods/metrics | 2D-CNN-CAE | DE-MCBM | Dense-CNN | CJHBA-DRN | Patch ResNet | EDN-SVM | BA-MCBM | SSA-DTCBiNet | SPO-MCBM | SEnO-DTCBiNet | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| BraTS 2020 dataset | Precision (%) | 92.50 | 93.65 | 87.57 | 89.81 | 92.24 | 89.33 | 93.22 | 95.11 | 95.39 | 95.53 |
| Recall (%) | 94.14 | 97.38 | 87.89 | 92.94 | 93.26 | 91.92 | 95.28 | 97.55 | 97.69 | 97.75 | |
| Accuracy (%) | 93.05 | 94.89 | 87.68 | 90.85 | 92.58 | 90.19 | 93.91 | 95.92 | 96.16 | 96.27 | |
| F1-Score (%) | 93.31 | 95.48 | 87.73 | 91.35 | 92.75 | 90.61 | 94.24 | 96.31 | 96.53 | 96.63 | |
| BT dataset | Precision (%) | 93.53 | 93.73 | 89.73 | 91.05 | 91.37 | 90.62 | 93.59 | 94.05 | 94.60 | 95.49 |
| Recall (%) | 97.07 | 97.47 | 88.49 | 94.49 | 96.00 | 93.23 | 97.34 | 97.63 | 97.69 | 97.72 | |
| Accuracy (%) | 94.71 | 94.97 | 89.32 | 92.19 | 92.91 | 91.49 | 94.84 | 95.24 | 95.63 | 96.24 | |
| F1-Score (%) | 95.27 | 95.56 | 89.11 | 92.74 | 93.63 | 91.91 | 95.43 | 95.81 | 96.12 | 96.59 | |
| BraTS 2018 dataset | Precision (%) | 93.66 | 94.56 | 90.35 | 91.85 | 93.03 | 91.25 | 94.04 | 94.83 | 94.95 | 95.37 |
| Recall (%) | 92.18 | 93.54 | 86.64 | 91.10 | 91.81 | 86.95 | 93.16 | 94.13 | 94.43 | 95.80 | |
| Accuracy (%) | 93.17 | 94.22 | 89.11 | 91.60 | 92.62 | 89.81 | 93.75 | 94.59 | 94.78 | 95.51 | |
| F1-Score (%) | 92.92 | 94.05 | 88.46 | 91.48 | 92.42 | 89.05 | 93.60 | 94.48 | 94.69 | 95.58 | |
| Real-time dataset | Precision (%) | 93.50 | 94.39 | 89.65 | 90.83 | 92.41 | 90.03 | 93.54 | 94.55 | 94.61 | 95.78 |
| Recall (%) | 93.96 | 97.34 | 84.71 | 92.04 | 93.96 | 91.33 | 96.47 | 97.45 | 97.51 | 97.87 | |
| Accuracy (%) | 93.66 | 95.38 | 88.00 | 91.23 | 92.92 | 90.46 | 94.52 | 95.52 | 95.58 | 96.48 | |
| F1-Score (%) | 93.73 | 95.84 | 87.11 | 91.43 | 93.17 | 90.67 | 94.99 | 95.98 | 96.04 | 96.82 | |
BA-MCBM, Bat-Algorithm-based multihead convolutional bidirectional memory; BT, brain tumor; CJHBA-DRN, Chronological Jaya Honey Badger Algorithm-based Deep Residual Network; CNN, convolutional neural network; DE-MCBM, Dolphin swarm algorithm-based multihead bidirectional convolutional Memory; SPO-MCBM, Sonar Prey Optimized Convolutional Bidirectional Memory; SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network; SSA-DTCBiNet, Sparrow Search Algorithm-based distributed tetra head attention-based convolutional bidirectional network.
The SEnO-DTCBiNet framework introduced for BT classification leverages the sophisticated strategies of segmentation and DL techniques, thus ensuring early and timely diagnosis. The framework incorporates OFA-based DW-Net for segmenting the complex brain structures by leveraging the tetra-head attention to discriminate the abstractive features and the SEnO optimization for enhancing the feature representation. Moreover, the deep architecture and the BiLSTM networks enable extracting long-range relationships and the structural patterns more effectively. Additionally, the SEnO algorithm used for tuning the parameters of the model enhances the model performance by finding the optimal solution. The MSMSD for the feature extraction process captures comprehensive tumor characteristics, enhancing classification performance. The integration of these strategies allows SEnO-DTCBiNet to obtain effective outcomes in differentiating tumor types and an automated solution for diagnosis, ensuring early intervention and treatments. During validation of the approach using the BraTS 2020 dataset, the proposed SEnO-DTCBiNet showed superior performance by acquiring an accuracy of 97.18%, precision of 97.31%, recall of 97.89%, and F1 score of 97.26% reflecting its proficiency against traditional approaches. The SEnO-DTCBiNet framework is limited by its opacity and lack of inherent interpretability, as well as the framework’s multicomponent complexity also risks overfitting. Future work will focus on developing and integrating explainable AI techniques that provide clear visual justifications for their decisions, fostering trust and transparency for medical professionals. The future work of the model would focus on integrating feature selection strategies, integrating explainable AI techniques that provide transparency for medical professionals, and the SEnO-DTCBiNet framework could be extended to integrate multimodal images, such as MRI and CT scans, by developing a robust image fusion strategy. This would unlock richer information from different imaging techniques, thereby improving diagnostic accuracy and robustness.