Have a personal or library account? Click to login
Tetra-ConvBiNet: MRI-based brain tumor classification using distributed tetra head attention-enabled convolutional bidirectional network Cover

Tetra-ConvBiNet: MRI-based brain tumor classification using distributed tetra head attention-enabled convolutional bidirectional network

Open Access
|Jan 2026

Full Article

I.
Introduction

The brain, a vital organ in the human body, is responsible for managing emotions, thoughts, behaviors, and sensory processing. Its cerebral cortex contains billions of neurons that facilitate signal transmission. The abnormal growth of brain cells in the human body causes tumors, which result in brain tumor (BT) [1]. BT is one of the most severe ailments that occurs in the body, which leads to a decrease in the normal life expectancy of humans. The tumors developed from the brain tissue spread aggressively to every part [2]. The abnormalities that develop in the human body may lead to death. One of the most dangerous abnormalities produced in the brain tissue causes BT, and it is categorized into two types. They are primary and secondary tumors [3, 4]. When a tumor forms inside the brain without any symptoms, it is called a primary tumor, which develops in the brain tissue itself. Another one is metastatic BT, which is formed in another part and spreads to the brain eventually [5]. The most widespread BT are pituitary adenomas, meningiomas, and gliomas. Several endeavors have been taken to diagnose the BT in an efficient way to automatically classify or detect the disease [6].

Magnetic resonance imaging (MRI) is one of the primary diagnostic tools used for identifying BT. However, due to the intricate structure of the brain, achieving precise diagnosis remains complex, demanding, and time-intensive [7]. BTs sometimes cause various symptoms in the body, based on the structure and position of tumors present, which include seizures, balance issues, unusual behavior, confusion, changes in vision, and memory issues [8]. To monitor, treat, analyze, and diagnose the human body, miscellaneous medical techniques are utilized, such as MRI, computerized tomography (CT), X-rays, and ultrasound imaging (UI). Among these medical professionals, most preferred MRI techniques because of their nonionizing radiation and noninvasive methods. Moreover, the MRI technique easily identifies the blood circulation in the veins and detects the cells [9, 10]. Many machine learning (ML) and deep learning (DL) algorithms use brain MRI images for detecting BTs using the datasets [11]. Because of the nonionizing radiation produced in the cells, an MRI scan can produce various image features that can be captured and used to recognize the tumor cells. The diagnosis of an MRI tumor has many modalities for separately classifying and segmenting [12]. The early-stage diagnosis and detection of cancer cells becomes critical, and it is a challenging task because of the tumor’s formation, shape, and dimensions. ML techniques are time-consuming, so advanced DL techniques are preferred. [3].

Numerous methods have been established to identify the BTs, with various techniques, such as ML and DL. One of the most widely used methods is convolutional neural network (CNN) because it can accurately classify and recognize images. For the identification of cellular structures and molecular identification, image classification, prognosis of diseases, and tissue segmentation, ML is preferred [13]. The method of analyzing, organizing, and collecting the medical images has become digitalized. Even in cutting-edge methods, the explication of medical-related images requires more time and mitigates the accuracy of classification [5]. The CNN approach used in the classification of multigrade BTs using MRI first segments the images, then augments the images, and then pretraining is given to correctly classify the BT [14]. The tumors can be either small or big, which makes the model harder to analyze the BTs. So, a balanced dataset is needed to accurately classify the brain images and detect the type of disease. But some datasets used in the research are imbalanced, and they affect the performance of the model [15, 16]. The NeeuroNet19 model accurately classifies the four types of tumor cells, and it faces some challenges, such as in tumor sizes and the position of the tumor cells. Artificial neural network with the integration of DL and ML has made a remarkable success in classifying the tumor cells and enhancing the overall performance [2].

To address these challenges, this study proposes a novel framework named sonar energy optimized distributed tetra head attention-based convolutional bidirectional network (SEnO-DTCBiNet), which aims to classify BT effectively, ensuring early identification and timely diagnosis, ultimately reducing BT-related fatalities. The MRI images are allowed for the classification model, where the precise region of interest (ROI) extraction is carried out using a binary thresholding technique to ensure only the most relevant features for analysis. The segmentation process allows for precise segmentation of tumor regions, which is crucial for reliable diagnosis. Moreover, the multistructural matrix-based statistical deep flow features (MSMSD) allows for capturing diverse rich information, which includes structural patterns, texture, and statistical properties, for classifying more discriminative features. The SEnO-DTCBiNet architecture utilizes the extracted features to differentiate between tumor categories. The core components and strengths of this approach are detailed in the following sections.

  • Optimized fused attention enabled distributed W-net based segmentation (OFA-based DW-net): Integrating tetra-head attention and the SEnO algorithm within the OFA-based DW-Net segmentation process improves accurate tumor region identification. This design combines several attention modules, such as BAM for enhanced feature learning, squeeze and excitation (SE) for channel-specific attention, Ultra Lightweight Sub Space Attention Module ULSAM for focusing on essential structures, and Criss-Cross (CC) Attention for dynamic contextual information capture. Additionally, the fine-tuning of W-Net through SEnO allows to attain precise outcomes in segmentation.

  • SEnO-DTCBiNet: The SEnO-DTCBiNet combines the distributed tetra head attention (DTHA) and the deep BiLSTM layers to capture temporal dependencies and enhance contextual understanding of tumor regions, leading to increased reliability and accuracy in BT classification. Additionally, the SEnO algorithm obtained by hybridization of Bat, dolphin echolocation, and the Sparrow Search characteristics effectively fine-tunes the model parameters, resulting in higher efficiency and robustness.

  • The structure of this paper is outlined as follows: Section II reviews related literature, Section III aligns the approach with the system model, Section IV details the methodology for tumor classification, Section V presents the outcomes, and Section V concludes the study.

II.
Literature Survey

This section explains the literature review part, describing the traditional approaches used in the context. Ozkaraca et al. [1] proposed a Dense-CNN approach for classifying BT using MRI data. The model incorporates existing DL networks, such as VGG16 and DenseNet, to process MRI images more effectively. While this approach boosts classification performance, it requires considerable processing time. Anantharajan et al. [11] developed a BT detection method based on DL and ML techniques. After acquiring the MRI images, they undergo preprocessing, segmentation, and feature extraction using various techniques. Ultimately, a DL model is employed to detect abnormal brain tissues. The result found that the suggested model established good sensitivity, accuracy, and specificity in classifying the normal and abnormal tissues. But it faces some concerns in classifying the image, as it considers only the grayscale images.

Deepa et al. [4] suggested a Chronological Jaya Honey Badger Algorithm-based Deep Residual Network (CJHBA-DRN) for categorizing the BTs based on an optimization method. However, DeepMRSeg is used for segmenting the images, and for training the CHBA algorithm is introduced. After that, features are extracted and then augmented by the CJHBA algorithm to categorize the BT using the dataset. Even though this method performs better in detecting BT, it has some challenges. This method considers only the selected features, which decreases the performance of classification. Muezzinoglu et al. [7] introduced a PatchResNet-based model to classify BTs using MRI images. In the developed method, two feature extractors are introduced in addition to the three feature selectors. The image classification method of the model was improved, and the accuracy rate for classifying BT was also enhanced by the Iterative hard majority voting (IHMV) technique. The main drawback occurred in this model was that the datasets had to be improved, and along with this, the absence of a better optimization method reduced the classification accuracy.

Rahman and Islam [13] suggested a DL model to detect BT identification through the datasets. Here, the images are first resized and deployed with a grayscale transformation in the model, and then augmented to reduce the complexity issues. Moreover, this PDCNN method extracts both local features as well as global features to avoid the over-fitting problem with the normalization method. This method has some limitations in handling the dataset, which has to be enlarged to identify the tumors in 3D pictures. Haque et al. [16] suggested a DL method-based NeuroNet19 method to detect BTs. VGG19 with the cascading method was utilized to extract features of global and local features in the image. An explainable artificial intelligence (AI) is employed to increase the model’s accountability, and LIME, which is used to improve the classification accuracy and mitigate false negative images. Therefore, this method shows an outstanding performance both in accuracy and promptness. The outcoming barriers that occurred in this model are, it does not detect binary class tumors, and the training can be improved with enlarged datasets.

Babu Vimala et al. [2] introduced a DL model to detect and categorize the BTs by Efficient Net. The Grad-CAM is utilized in the model to highlight the affected areas and classify the tumorous cells in the brain. The fine-tuned EfficientNetB2 accurately explains the generalization and efficiency of the model through testing phases and achieves better performance. But the flaws involved in this method need more time for training, which suppresses the speed of classification. Islam et al. [6] proposed a DL model to diagnose BT cells using MRI images. Here, four architectures are utilized to classify BTs, which include Mobile Net, InceptionV3, DenseNet121, and VGG19. The BT cells are first classified through a fine-tuning method, and a comparison is made with the above approaches. From that, the Mobile Net achieves better accuracy by classifying BT cells. Even though this method enhances classification accuracy, it is selected only for a few features.

a.
Challenges

The limitations that occurred during the classification of BT in earlier approaches are discussed below.

  • In the suggested model [1], the classification accuracy for detecting the BT is improved, but the processing time required is longer.

  • The EDN-SVM classifier method [11] considers only the gray-scale images to accurately detect the normal and abnormal tissues present in the brain cells, which limits its applicability across diverse types.

  • In [6], the PDCNN method captures both high- and low-level image features, but expanding the dataset is necessary for improved BT cell classification accuracy.

  • The EfficientNetB2 method shows better performance with good efficiency, but the training time required is more, which diminishes the speed of detection [2].

  • In the developed method [6], only the selected features are utilized by the model, so the classification process depends on the selected region and suppresses the other tissues in the MRI image.

b.
Problem definition

BT causes disability, dreadful illness, and even death to people at an early stage. The DL model with advanced techniques in the medical field enables to diagnosis of BT at an early stage to save human life. But still, the researchers face many challenges in detecting the BT and diagnosing it, because of its complex nature of the brain. The main barriers in the existing models involved lower accuracy, overfitting problems, low classification performance was low, and higher time complexity. Therefore, to detect the disease accurately with more precision and effectively enhance the tumor grading performance, a SEnO-DTCBi Net model is proposed. The input images collected from the datasets is denoted as: (1) D=s1,y,s2,y,,si,y,,sm,y D = \left\{ {\left( {{s_1},y} \right),\left( {{s_2},y} \right), \ldots ,\left( {{s_i},y} \right), \ldots ,\left( {{s_m},y} \right)} \right\} where D represents the dataset, si implies the input samples of ith image, sm implies the total amount of input samples, m denotes the total number of samples, and y expresses the tumor types for the input samples. The samples are passed through the ROI extraction stage, in which specific areas are isolated and forwarded to the segmentation module. This step, executed using OFA-based DW-Net, enhances model efficiency and reduces computational load. The segmented samples are combined with the fused attention mechanism and capture the multiple-scale features. This provides an enhanced feature refinement as well as the localization effect. Furthermore, the segmentation model extracts the segmented regions from the samples and categorizes the different BT cells from the scanned image samples, and addresses the variations correspondingly. Therefore, from the segmented output G, the segmented results are accurately obtained. Furthermore, the extracted feature map Q is given as input to the proposed model for tumor classification. Following feature extraction, the resulting feature maps serve as input to the tumor classification model. The SEnO-DTCBiNet architecture is assessed using the categorical cross-entropy (CCE) loss function throughout the training process, which is mathematically defined as: (2) Lossfun=a=1nbalogUa Loss\;fun = - \sum\limits_{a = 1}^n {{b_a}\log \left( {{U_a}} \right)} where the predicted probability is denoted as Ua, the actual label as ba, and n indicates the total number of classes. Additionally, the real class labels from the BT dataset [17] are defined as: (3) ba=1,meningioma2,glioma3,pituitarytumor {b_a} = \left\{ {\matrix{ {1,meningioma} \hfill \cr {2,glioma} \hfill \cr {3,pituitary\;tumor} \hfill \cr } } \right.

The actual labels of the Brats2020 dataset [18] and Brats2018 dataset [19] are specified as: (4) ba=1,Necroticandnonenhancingtumor2,GDenhancingtumor3,peritumoraledema {b_a} = \left\{ {\matrix{ {1,Necrotic\;and\;non - enhancing\;tumor} \hfill \cr {2,GD\;enhancing\;tumor} \hfill \cr {3,peritumoral\;edema} \hfill \cr } } \right.

III.
System Model

The research implemented for an effective BT classification system includes the initial process of collecting diverse MRI images through CT scan machines from laboratories or hospitals. These images are aggregated in BraTS 2020 [18], BraTS 2018 [19], the BT dataset [17], and the real-time dataset and are further allowed for the ROI extraction process to ensure that only the most relevant and important features are considered for analysis. Then, a series of features is captured to understand the complex structures of tumor regions more precisely, leading to more discriminative features for classification. These features are then fused into the proposed classification system to initiate the training phase. Test features are then applied to the trained model to obtain the tumor grading for differentiating the types of BT for timely and early diagnosis.

IV.
Classification of BT Using SEnO-DTCBiNet

This search introduces an approach named the SEnO-DTCBiNet framework, designed for BT detection and classification using MRI scans. Initially, the collected BT images undergo a ROI extraction stage, where significant features are identified and passed on to the OFA-driven DW-Net architecture. Through this model, a SEnO optimization is utilized with the integration of bat optimization, Dolphin Optimization, and Sparrow Search optimization to train the model and enhance the model’s effectiveness. Consequently, the segmentation phase is carried out on a w-net model with four fused attention mechanisms for the accurate segmentation of images. Following this, the segmented output proceeds to a dedicated feature extraction unit, where vital attributes are drawn out using a MSMSD model. The MSMSD model includes three different, distinct features, such as GLCM-derived features, statistical deep flow-based features, and hybrid 3D Structural pattern features. Once these MSMSD features are obtained, they are given to the SEnO-DTCBiNet model to classify BT such as Necrotic and nonenhancing tumors, GD-enhancing tumors, and peritumoral edema. A schematic representation of the SEnO-DTCBiNet architecture is shown in Figure 1.

Figure 1:

Block diagram of the SEnO-DTCBiNet framework. SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network.

a.
Input MRI image collection

The input images are collected from the Brats2020 dataset [18], the BT dataset [17], and the Brats2018 dataset [19]. Let us assume the dataset is D, and it is given by, (5) D=si;1im D = \left\{ {{s_i}} \right\};1 \le i \le m where si denotes the ith input image sample with a dimension of [240,240,1], and the total number of samples present in an input image is denoted as m.

b.
Thresholding-based ROI extraction

The collected input image si is fed to an ROI extraction phase, which extracts the selected ROI from the images for further processing through a binary thresholding mechanism. The original image is changed into a grayscale image with pixel values ranging from [0,255]. The threshold value Tr is calculated by taking the average value of the grayscale image. For each pixel si, the intensity is compared with the threshold value Tr. During comparison, when the pixel intensity value is higher than Tr, then the binary value is denoted as “1,” and that is taken as the interested regions, and if the pixel intensity value is less than Tr, then the binary value will be represented as “0,” and those regions will be eliminated. So, to obtain a clear image, a binary mask is applied that retains the required region. The major purpose of the mechanism is to reduce errors and to enhance classification accuracy. In the ROI extraction phase, the output dimension of the images varies for each slice, and hence, the resized images are formed with a size [512,512,1] that helps the segmentation process to be carried out more smoothly. Thus, the resultant output H is obtained from the ROI extraction, which is further transferred to the segmentation phase.

c.
Optimized fused attention-based distributed W-Net for segmentation

Segmentation is employed to enhance the precision of BT classification and support more accurate diagnosis. By segmenting the images, the detection process becomes more efficient, and the overall analysis of the medical images is significantly streamlined. Hence, the segmentation process is performed by an OFA-based DW-Net model for identifying the accurate position of the BT. The image obtained from the ROI extraction step serves as the input for the OFA-based DW-Net method [20]. In general, the W-Net structure is considered, where four attention mechanisms are fused, namely, BAM, SE, ULSAM, and CC attention, in a distributed manner to derive the OFA-based DW-Net segmentation model. Additionally, the model’s segmentation component is fine-tuned through an optimization algorithm that operates at the fifth layer of the framework, with detailed insights provided in Section 4.6.

The OFA-based DW-Net segmentation depends on the W-Net model, which reconstructs the input image and predicts the output map without degrading the information. It is composed of two parts encoder and a decoder, with 47 convolutional layers with 18 modules. In the encoder path, the down-sampling process is carried out with the fused Attention mechanisms of BAM, SE, ULSAM, and CC. These fused mechanisms are arranged in a distributed manner, and following this, the up-sampling phase is employed. The context is captured through the utilization of a contacting path, which is presented as Zenc encoder, the expansive path that is useful for precise localization is known as the decoder Zdec. The encoder part of the segmentation model takes the input as H having the dimension of dimension (N,512,512,1). Moreover, the encoder module is composed of 2 × 2 convolutional layers followed by a dropout layer and a max pooling layer, such that the dimension at each stage of the down-sampling layer in the encoder phase increases. On the contrary, the dimension is reduced at each layer of the up-sampling layer that lies in the encoder module. The OFA-based DW-Net architecture is depicted in Figure 2.

Figure 2:

OFA-based DW-Net model architecture.

The resultant output obtained at the final dropout layer of the down (N,64,64,512)-sampling process at the encoder side is represented as R, having the dimension of [N,32,32,1024], which is further passed to the fused attention framework in a distributed manner, and the concatenated output derived through the attention model has the size of [N,32,32,4096], respectively. In the up-sampling process, with the transposed H(N,512,512,1) 2D convolutional layer, the feature maps are expanded and combined with the equivalent sizes of the feature map in the down-sampling process by a layer called the skip layers. The skip layers are utilized in up-sampled feature maps for improving the region localization. A SoftMax layer is applied after the final convolutional layer in the encoder path to minimize the normalized cut (Ncut) value.

The Ncut loss function is used to eliminate the segmentation noise and smooth the layer. Therefore, the Ncut value is calculated by a mathematical expression as, (6) NcutjN=j=1JcutEj,NEjassoEj,N Ncu{t_j}\left( N \right) = \sum\limits_{j = 1}^J {{{cut\left( {{E_j},N - {E_j}} \right)} \over {asso\left( {{E_j},N} \right)}}} where N represents the all-pixel groups, and Ej describes the group of pixels in jth the image. Furthermore, the OFA-based DW-Net model is ready to minimize reconstruction loss for implementing a contracted path and consists of numerous collections of information about the original image. Thus, the reconstruction loss is expressed by a mathematical equation of, (7) ψreconst=siZdecZencsi;Benc;Bdec22 {\psi _{reconst}} = \left\| {{s_i} - {Z_{dec}}\left( {{Z_{enc}}\left( {{s_i};{B_{enc}}} \right);{B_{dec}}} \right)} \right\|_2^2 where the parameters of the encoder and decoder are represented by Benc, and Bdec, respectively, and the original input image is depicted as si. Hence, the pixel set values in the selected region are maximized and acquire the regenerated image at the same time, to boost the segmented precision of the image. The main advantage of using the OFA-based DW-Net method is that it enhances the accurate performance of the image segmentation and captures fine details of the features. Along with this computational cost is mitigated with better generalization capability, robustness, and good flexibility. So, these fused attention mechanisms are utilized in this research to provide better performance in disease classification. Based on the segmentation process, nontumor cells are removed, and G represents the segmented tumor cell. Hence, the segmented image’s size is [N,512,512,1], which is further used for the feature extraction process. The brief explanation of BAM, SE, ULSAM, and CC fused attention mechanisms is elaborated below.

c.i
Bottleneck attention mechanism

BAM [21] is widely used with the convolutional network model because of its simple and efficient size. The major purpose of this attention is to increase the representation capability of the model by considering spatial and channel attention. The Channel attention and the spatial attention are the two different branches exist in the channels of the feature map. The input feature is taken as R ∈ ℝc×h×w, and a 3D attention map for BAM is noted as A(R) ∈ ℝc×h×w. Figure 3 represents the structure of BAM.

Figure 3:

Bottleneck attention mechanism.

In the channel attention branch, to calculate a feature map, average pooling layer is taken, which encodes the information present in each channel. For determining the attention, a multilayer perceptron is used in a hidden layer, and a batch normalization layer is also used for adjusting scale parameters. Hence, the channel attention is mathematically expressed as: (8) AcR=BaMIApR {A_c}\left( R \right) = Ba\left( {MI\left( {Ap\left( R \right)} \right)} \right) where Ap(R) implies average pooling, Ml is the multilayer perceptron, Ba denotes Batch normalization, and Ac(R) denotes channel attention.

Spatial attention is employed to filter out features distributed across different spatial positions. This module utilizes two key hyperparameters: the dilation rate, which helps in capturing contextual information, and the reduction ratio, which manages channel capacity and controls computational overhead. The feature is first reduced using (1× 1) convolution, and then (3 × 3) dilated convolutions are used, and finally, the reduced features are again convoluted using (1× 1) batch normalization in the spatial branch output for scale adjustment. Therefore, spatial attention is represented as, (9) AsR=Baf31×1f23×3f13×3f01×1R {A_s}\left( R \right) = Ba\left( {f_3^{1 \times 1}\left( {f_2^{3 \times 3}\left( {f_1^{3 \times 3}\left( {f_0^{1 \times 1}\left( R \right)} \right)} \right)} \right)} \right) where f represents a convolution operation, Ba implying the batch normalization, As(R) and denotes the spatial attention. To obtain an efficient module, channel attention is added with the spatial attention along with the sigmoid function, and then the attention map is calculated as: (10) AR=σAcR+AsR A\left( R \right) = \sigma \left( {{A_c}\left( R \right) + {A_s}\left( R \right)} \right)

The sigmoid function is denoted by σ, spatial attention is expressed by As(R), and channel attention is denoted by Ac(R). Thus, the BAM output R1 is calculated by adding the original input feature with the attention module, and it is mathematically expressed as, (11) R1=R+RAR {R_1} = R + R \otimes A\left( R \right) where ⊗ express the element-wise multiplication. Thus, the BAM model effectively enhances the model performance and mitigates the overhead and computational complexity problems.

c.ii
SE attention

The working principle of SE attention [22] depends on the convolution network model, and the spatial dependencies are figured out by global average pooling. The SE method in image segmentation provides more pixel-wise information about the spatial blocks, which is more accurate. Assume the input of SE as R = {r1, r2 ,..., rn} with the number of channels, such that rk = ℝh×w, where spatial height and spatial height are indicated as h and w, respectively. The working mechanism of SE is based on two different frameworks, namely the spatial framework and channel framework. Hence, the spatial model employs global average pooling as a squeeze operation. This generates a vector by summarizing the spatial information of each feature map q, and it is represented by: (12) q=1h×wrki,j q = {1 \over {h \times w}}{r_k}\left( {i,j} \right)

Furthermore, the vector q is processed through two fully connected (FC) layers with the ReLU operator. Finally, the resultant factor of the FC layer is given to the sigmoid layer to generate the output σq^ \sigma \left( {\hat q} \right) . (13) R^cSE=σq1^r1,σq2^r2,,σq^rn {\hat R_{cSE}} = \left[ {\sigma \left( {\widehat {{q_1}}} \right){r_1},\sigma \left( {\widehat {{q_2}}} \right){r_2}, \ldots ,\sigma \left( {\hat q} \right){r_n}} \right]

The channel model performs channel squeeze operations using a convolutional layer, its output is denoted as c, and it can be represented by c = R × wt, where wt denotes the weight factor, and c implies the output of the projection sensor. Each projection tensor is further fed to the sigmoid layer to retrieve the output as σ(c), and the channel model output is represented by: (14) R^sSE=σc1r1,σc2r2,,σcnrn {\hat R_{sSE}} = \left[ {\sigma \left( {{c_1}} \right){r_1},\sigma \left( {{c_2}} \right){r_2}, \ldots ,\sigma \left( {{c_n}} \right){r_n}} \right]

Also, the irrelevant features are reduced, and important spatial locations are delivered by the recalibration. The spatial, channel, squeeze, and excitation components are integrated into a single block by concatenating the outputs from both channels. (15) R2=conRcSE^,RsSE^ {R_2} = con\left( {\widehat {{R_{cSE}}},\widehat {{R_{sSE}}}} \right)

This combined network of the feature map provides both channel-wise and spatial information, segments the important features, reduces the complexity problems, and the resultant output will be denoted as R2. Figure 4 represents the SE structure.

Figure 4:

SE attention. SE, squeeze and excitation.

c.iii
Ultralightweight subspace attention mechanism

The structure of ULSAM [23] is designed by considering a convolution and pooling layer such that it learns an attention map in each feature subspace, which minimizes the channel and spatial redundancy. Since the attention map was learned by the model, it enhances the image segmentation and efficiently learns the cross-channel feature map interdependencies. Let us consider a feature map R ∈ ℜb×h×w from the dropout layer, b be the number of input channels, and h w represents the dimension of the spatial map. The major goal is to capture features of cross-channel interdependencies effectively in the feature map. The feature map R is divided into d groups, and each group has a feature map D [R1,R2 ,...,Rn ,...,Rd]. In ULSAM, the features are extracted through a max pooling layer, a depthwise convolution layer, and a point-wise convolution layer with a feature map Rn. The structure of ULSAM is depicted in Figure 5.

Figure 5:

Structure of ULSAM.

An attention map Attm captures each group’s nonlinear dependencies to capture the cross-channel information in a feature map. To verify weighing attention, a gating function is utilized with the SoftMax activation function and refines the feature map. The output R3 obtained in the feature map is obtained by joining the feature maps of every group. (16) Attnn=softmaxPCMPDCR Att{n_n}\; = soft\max \left( {PC\left( {MP\left( {DC\left( R \right)} \right)} \right)} \right) (17) R^n=AttnnRnRn {\hat R_n} = \left( {Att{n_n} \otimes {R_n}} \right) \oplus {R_n} (18) R3=concatR^1,R^2,,R^n,R^d {R_3} = concat\left( {{{\hat R}_1},{{\hat R}_2}, \ldots ,{{\hat R}_n}, \ldots {{\hat R}_d}} \right) where soft max denotes the SoftMax activation, PC signifies point-wise convolution, MP represents a max pooling layer, DC shows depth-wise convolution, Attnn is the attention map in each group, the feature map at each layer is shown as R^n {\hat R_n} , defines the refined feature map, and ⊕ implies the element-wise addition.

c.iv
CC attention

The CC attention [24] model is mainly used to capture the contextual information from the images in an optimal way. Because of its lightweight computation and more memory capability to define the local feature representations. This attention mechanism collects the vertical and horizontal contextual information to improve the pixel-wise explanation. The structure of CC attention is presented in Figure 6.

Figure 6:

Structural diagram of CC. CC, criss-cross.

Consider a local feature map R ∈ ℝc×w×h, which is fed into convolution layers to generate the feature maps as, J, L, and X. {J, L} ∈ ℝc′×w×h, c′ denotes the number of channels that exist. An attention map is generated after acquiring the feature maps J, L through an affinity operation. At each position τ in the spatial dimension, feature vectors generated from J and L are indicated as and , respectively. Hence, the affinity operation is expressed as Aτ = Jτ Lτ, which refers to the degree of correlation among feature vectors Jτ and Lτ. Furthermore, a SoftMax operation is applied to Aτ compute the attention map AM. Moreover, a convolution layer with [1× 1] a filter is applied over the input feature to create X for feature adaptation process, and the respective feature vector is denoted as Xτ. (19) R4=AMτxτ+R {R_4} = \sum {A{M_\tau }{x_\tau } + R} where AMτ is the attention map’s output, and R4 represents the output of the CC attention model. Thus, the CC attention captures the contextual information pixel-wise. Therefore, the total fused attention mechanism output is concatenated, and the resultant output Q is passed as an input to the feature extraction phase. (20) Q=R1R2R3R4 Q = \left\{ {{R_1}||{R_2}||{R_3}||{R_4}} \right\}

d.
Extraction of MSMSD

The MSMSD-based feature extraction strategy incorporates multiple feature types, including statistical deep flow features, GLCM-based features, and 3D structural pattern features. This approach helps minimize computational complexity and enhances the model’s accuracy.

Hybrid structural pattern features encompass a range of descriptors, including local binary pattern (LBP), local ternary pattern (LTP), and local gradient pattern (LGP).

The effective features from the images are acquired using the statistical approach known as LBP [25]. Therefore, it considers only the local features of the ROI extracted images, and it is mathematically expressed as: (21) LBPQ=αPzPx2z LBP\left( Q \right) = \sum {\alpha \left( {{P_z} - {P_x}} \right){2^z}} where LBP(Q) denotes the features of LBP with the dimension [N,120,120,1], Px represents the center pixel values, and Pz implies the zth neighbor pixels. The LTP is a three-valued texture feature in determining the pixel values α denotes the constant. The extension of LBP is called the LTP and assigns the values from −1 to 1. Therefore, the expression of LTP is denoted by: (22) LTPQ=1ifPzPx>η0ifPzPxη1ifPzPx<η LTP\left( Q \right) = \left\{ {\matrix{ {1\;\;\;if\;{P_z} - {P_x} > \eta } \hfill \cr {0\;\;\;if\left| {{P_z} - {P_x}} \right| \le \eta } \hfill \cr { - 1\;\;\;if\;{P_z} - {P_x} < - \eta } \hfill \cr } } \right. where the features of LTP with the dimension [N,120,120,1] is represented by LTP(Q), and η is defined as the threshold value. LTP feature extraction method is utilized because of its classification accuracy and for extracting the texture features of an image. The LGP uses the adjacent pixel gradient in the quantified pixel values. The LGP method [26] enhances the interpretability of the model when extracting the features for classification. The expression for LGP is mathematically denoted by: (23) LGPQ=αvnv¯2n LGP\left( Q \right) = \sum {\alpha \left( {{v_n} - \bar v} \right){2^n}} where LGP(Q) implies the features of LGP with the dimension [N,120,120,1], v¯ \bar v denotes the mean gradient, and vn expresses the central pixels and the adjacent pixels’ average value. Therefore, the hybrid 3D structural pattern-based features are represented by concatenating all the extracted features. (24) β=LGPQLBPQLTPQ \beta = \left\{ {LGP\left( Q \right)\left\| {LBP\left( Q \right)} \right\|LTP\left( Q \right)} \right\}

β represents the extracted hybrid 3D structural pattern-based features with dimension [N × 120 × 120 × 3].

Statistical deep flow-based features are used to isolate the ROI from the extracted image, allowing the generation of a deep flow feature map. The statistical features [27] improve the performance of classification accuracy and make the model more reliable. The statistical deep flow feature uses the “cv2.calcOpticalFlowPyrLK” package to acquire the deep flow feature, and it is mathematically denoted as φi. As a result, the statistical features derived include mean, standard deviation, variance, skewness, and kurtosis. Table 1 demonstrates the characteristics obtained from the deep flow feature representation.

Table 1:

Features based on statistical deep flow

FeaturesDescriptionMathematical notationOutput size
MeanIt is the ratio between the total intensity of all pixels and the total number of pixels within the deep flow feature image. T1=1ri=1rφi {T_1} = {1 \over r}\sum\limits_{i = 1}^r {{\varphi _i}} [N,120,120,1]
where φi is the feature and r the total images.
KurtosisKurtosis is defined as the shape of the selected images taken for the statistical measurement. T2=T1T1¯4rT3 {T_2} = {{\sum {{{\left( {{T_1} - \overline {{T_1}} } \right)}^4}} } \over {r\left( {{T_3}} \right)}} [N,120,120,1]
T2 is the kurtosis
Standard deviationThe standard deviation is defined as the square root of the variance and represents the average deviation of each pixel intensity from the mean. T3=1ri=1rφiT12 {T_3} = \sqrt {{1 \over r}\sum\limits_{i = 1}^r {{{\left( {{\varphi _i} - {T_1}} \right)}^2}} } [N,120,120,1]
standard deviation is denoted as T3
SkewSkewness is defined as a measure of symmetry of the image. T4=1ri=1rφiT1T3 {T_4} = {1 \over r}\sum\limits_{i = 1}^r {\left[ {{{{\varphi _i} - {T_1}} \over {{T_3}}}} \right]} [N,120,120,1]
T4 Is the skew
VarianceVariance is defined as the square of the standard deviation. T5=T32=1ri=1rφiT12 {T_5} = {\left( {{T_3}} \right)^2} = {1 \over r}\sum\limits_{i = 1}^r {{{\left( {{\varphi _i} - {T_1}} \right)}^2}} [N,120,120,1]
T5 Is the variance

Eq. (25) demonstrates the statistical deep-flow-based features with the dimensionality [N,120,120,5]. (25) θ=T1T2T3T4T5 \theta = \left\{ {{T_1}||{T_2}||{T_3}||{T_4}||{T_5}} \right\}

GLCM [28] method is preferred because of its accuracy, and the computing time for extracting the images is shorter. It consists of entropy, energy, contrast, homogeneity, and dissimilarity. In Table 2, the features of GLCM are tabulated below.

Table 2:

Features based on GLCM

FeaturesOverviewFormulaDimensions of outputs
HomogeneityHomogeneity calculates the similarity of the texture in the distributed gray-level object pairs. E3=kln1Mkl1+kl {E_3} = \sum\limits_{kl}^{n - 1} {{{{M_{kl}}} \over {1 + \left| {k - l} \right|}}} [N,120,120,1]
homogeneity feature is represented as E3.
EnergyEnergy is used to calculate the uniformity of an image. E1=kln1Mkl2 {E_1} = \sum\limits_{kl}^{n - 1} {{{\left( {{M_{kl}}} \right)}^2}} [N,120,120,1]
E1 It is the energy feature, and GLCM of the image Q is denoted as Mkl.
EntropyEntropy reproduces the complexity of an image present in the GLCM features. E4=kln1Mkllog2Mkl {E_4} = \sum\limits_{kl}^{n - 1} {{M_{kl}}{{\log }_2}\left( {{M_{kl}}} \right)} [N,120,120,1]
E4 Represents the entropy.
ContrastContrast calculates the local variation amounts present in the image. E5=kln1M(kl)2 {E_5} = \sum\limits_{kl}^{n - 1} {M{{(k - l)}^2}} [N,120,120,1]
E5 Depicts the contrast.
DissimilarityIt measures the gaps between the mean variances and ROI in the gray-scale image E2=kln1Mklkl {E_2} = \sum\limits_{kl}^{n - 1} {{M_{kl}}\left| {k - l} \right|} [N,120,120,1]
E2 Denotes the dissimilarity feature of GLCM.

ROI, region of interest.

The equation below represents the extracted GLCM features: (26) μ=E1E2E3E4E5 \mu = \left\{ {{E_1}||{E_2}||{E_3}||{E_4}||{E_5}} \right\} where μ denotes the GLCM features by containing a dimension of [N,120,120,5]. Thus, the MSMSD-based features are obtained by concatenating all the features, and it is established as: (27) U=βθμ U = \left\{ {\beta ||\theta ||\mu } \right\}

The extracted MSMSD-based features are specified as U with dimension [N,120,120,5]. These are transferred to the distributed tetra head attention-based convolutional bidirectional network (DTCBiNet) model to classify the BTs.

e.
DTCBiNet for the classification of BT

The DTCBiNet model is the integration of a convolutional and BiLSTM network, and it is used for the classification of BT. The DTCBiNet model comprises different layers, such as an input layer, a convolutional layer, a Leaky ReLU layer, a max pooling layer, a Dropout layer, a Reshape layer, an attention mechanism layer, a concatenation layer, a BiLSTM layer, and a Dense layer, respectively. This DTCBiNet model helps to extract the complex patterns in the feature and boosts the performance of the model with better classification accuracy. Along with this, the DTCBiNet model extracts both the spatial and the temporal relationships of the features and enhances the model’s performance. The introduction of an attention mechanism in the model mitigates redundancy factors and effectively maximizes the learning process.

In this DTCBiNet model, the initially extracted image U is passed as input to the model, with [N,120,120,13] dimension, the image fed to the convolutional 2D layer. This layer extracts more features, and it acts as the main component in the DTCBiNet model. In the first convolutional layer, low-level features are first extracted, and high-level features are transferred to further convolutional layers. The mathematical equation of a convolutional layer is represented by, (28) J1=Lfi+bi {J_1} = \sum {L \otimes fi + bi} where the feature map is indicated by J, the bias is denoted by bi, and the convolutional filter is depicted by fi. The feature map J output is passed to the max pooling layer, and if the feature map gained more output from the convolutional layer, then it cannot be applied to the training directly. This causes more issues in overfitting problems and leads to more consumption of time, so down-sampling is used to minimize the parameters to reduce the size of the max pooling to produce an effective outcome. The size is reduced by the dimension of [N,60,60,16], after the maxpooling convolutional operation, and it is transferred to a second convolutional 2D network to extract the features, and it is represented by J2. The output from another convolutional layer is passed to an activation function, which is called Leaky ReLU. This activation function is utilized because of its low computation time and simplicity. The Leaky ReLU is represented by, (29) LRLJ2=max0,J2+υ×min0,J2 LRL\left( {{J_2}} \right) = \max \left\{ {0,{J_2}} \right\} + \upsilon \times \min \left\{ {0,{J_2}} \right\} where LRL(J2) is the Leaky ReLU activation function, υ implies the parameter of the Leaky ReLU function, and has a dimension of [N,60,60,64]. Next, output from the Leaky ReLU is transferred to a next max pooling layer, followed by a Dropout layer with a dimension of [N,30,30,64]. Consequently, the output from the dropout layer with size [N,30,30,64] gets reshaped to the size of [N,900,64], and it can be depicted as d, which is transferred to the multi-head attention mechanisms. This MA mechanism is a combination of various attention mechanisms and gains a corresponding input feature. The combination of BAM, SE, ULSAM, and the CC mechanism is obtained by concatenating the features of all mechanisms, and they are elaborated in Section 4.3 with a detailed explanation and diagram. And these concatenated mechanisms produce an output with a dimension of [N,900,256]. The BiLSTM consists of an input layer, a bidirectional layer, and an output layer, which boosts the model’s performance and captures both the bidirectional features of the model. The bidirectional layer included both forward and backward layers, in which the forward layer extracts forward features from the input and the backward layer extracts backward features. Then, both layers are concatenated to produce the output. The mathematical expression for representing these BiLSTM layers is: (30) fl=σwtsV,wx1+bi \overrightarrow {fl} = \sigma \left[ {wts\left( {V,\overrightarrow {{w_{x - 1}}} } \right) + bi} \right] (31) bf=σwtsV,wx1+bi \overleftarrow {bf} = \sigma \left[ {wts\left( {V,\overleftarrow {{w_{x - 1}}} } \right) + bi} \right] where fl,bf \overrightarrow {fl} ,\overleftarrow {bf} denotes the forward and backward layer outputs, σ implies the sigmoid function, wts and bi denotes the model’s weight and bias, wx1,wx1 \overrightarrow {{w_{x - 1}},} \overleftarrow {{w_{x - 1}}} denotes the forward hidden and the backward hidden layer, and the output is represented as: (32) K=flbf K = \left[ {\overrightarrow {fl} \oplus \overleftarrow {bf} } \right] where BiLSTM layer output is represented as K, and ⊕ denotes the addition operation, and then the output is transferred to the next BiLSTM layer, with the size of [N,128] is passed to a dense layer. The dense layer has a dimension [N × 64] and is then transferred to the next dense layer with a reduced dimension [N,32]. Finally dense layer produces the output with the size of [N,3]. Finally, output is categorized into three labels as glioma, pituitary tumor, and meningioma, and the output is mathematically expressed as Ua. The biases and weights of the DTCBiNet model were optimized by the SEnO algorithm, and it is discussed in Section 4.6 section, the optimization is applied after the result of the BiLSTM layer. The DTCBiNet model architecture is depicted in Figure 7.

Figure 7:

Architecture of DTCBiNet. DTCBiNet, distributed tetra head attention-based convolutional bidirectional network.

f.
Sonar energy optimization algorithm

The SEnO algorithm is obtained by integrating the bat optimization [29], the echolocation-based Dolphin optimization [30] algorithm, and the Sparrow Search Algorithm (SSA) [31]. The SEnO algorithm is employed to optimize the bias and weight of the DTCBiNet model by applying the optimization at the end of the BiLSTM layer. On the contrary, for training the segmentation approach, on the fifth layer, the optimization is integrated to improve the performance of the model.

f.i
Motivation

For finding prey and to navigate by listening to the echoes in the surroundings, the bat uses the echolocation behavior. In dark situations, it avoids the hindrance problem. Likewise, the dolphin utilizes echolocation by generating the sound, which is in the form of clicks, to locate the obstacles and its prey. The SEnO algorithm rapidly improves the model performance and reduces the overfitting challenges. The development of the SEnO algorithm in the model executes with better escaping, grouping, and alarming strategies and enhances the overall model mechanisms.

a)
Initialization

In the initialization phase, first consider the population with a number of solutions h, and the total number of solutions t. Algorithm’s initial population is mathematically expressed as: (33) F=F1FiFh=F1,1,F1,qF1,tFi,1Fi,qFi,tFh,1Fh,qFh,th×t F = \left[ {\matrix{ {{F_1}} \hfill \cr \vdots \hfill \cr {{F_i}} \hfill \cr \vdots \hfill \cr {{F_h}} \hfill \cr } } \right] = {\left[ {\matrix{ {{F_{1,1}},} & \cdots & {{F_{1,q}}} & \cdots & {{F_{1,t}}} \cr \vdots & {} & \vdots & {} & \vdots \cr {{F_{i,1}}} & \cdots & {{F_{i,q}}} & \cdots & {{F_{i,t}}} \cr \vdots & {} & \vdots & {} & \vdots \cr {{F_{h,1}}} & \cdots & {{F_{h,q}}} & \cdots & {{F_{h,t}}} \cr } } \right]_{h \times t}} where Fi,q implies the location of the ith solution in the qth dimension. (34) Fi,q=Upi,q+e1Lwi,qUpi,qi=1,2,,hq=1,2,,t {F_{i,q}} = U{p_{i,q}} + {e_1}\left( {L{w_{i,q}} - U{p_{i,q}}} \right)\;\;\;\left\{ {\matrix{ {i = 1,2, \ldots ,h} \hfill \cr {q = 1,2, \ldots ,t} \hfill \cr } } \right. where the number of dimensions is denoted as t, the number of solutions is expressed as i in the search space, h refers total number of solutions, and Fi,q denotes the first position of the ith solution in qth the dimension. e1 Refers to a random number within the interval of (0,1). The qth search space upper bound is denoted as Upi,q and the lower bound is represented as Lwi,q.

b)
Calculation of fitness function

Fitness function of the SEnO algorithm is computed depending on the accuracy measure, such that the model’s performance is observed by attaining maximum accuracy, which is represented by: (35) Fitn=f1fifhh×1=fF1fFifFhh×1 Fitn = {\left[ {\matrix{ {{f_1}} \hfill \cr \vdots \hfill \cr {{f_i}} \hfill \cr \vdots \hfill \cr {{f_h}} \hfill \cr } } \right]_{h \times 1}} = {\left[ {\matrix{ {f\left( {{F_1}} \right)} \hfill \cr \vdots \hfill \cr {f\left( {{F_i}} \right)} \hfill \cr \vdots \hfill \cr {f\left( {{F_h}} \right)} \hfill \cr } } \right]_{h \times 1}} where Fitn denotes the vector obtained from the objective function, and fi represents the objective value obtained from the ith solution.

c)
Enhancement parameter consideration

The SEnO algorithm moves toward the optimal solution for obtaining a better solution with two decision factors, and it can be expressed as: (36) Mit+1=κMit M_i^{\left( {t + 1} \right)} = \kappa M_i^{\left( t \right)} (37) Vit+1=Vi01expλt V_i^{\left( {t + 1} \right)} = V_i^{\left( 0 \right)}\left[ {1 - \exp \left| { - \lambda t} \right|} \right] where Mit+1Mit M_i^{\left( {t + 1} \right)}\;M_i^{\left( t \right)} represents the learning rate factor at (t + 1)th tth iterations, Vit+1 V_i^{t + 1} shows pulse emission at (t + 1)th iteration, Vi0 V_i^{\left( 0 \right)} represents the first pulse emission, κ,λ are the parameters that are used to improve the decision, and they fall in the range of [0 < κ < 1],[0 < λ < 1].

  • Case (i): Exploration Phase: When Ra<Vit \left( {Ra < V_i^t} \right) Whenever the random variable ranges among [0,1], this criterion occurs, the exploration phase will be implemented.

  • Subcase (i): Wide search based on Velocity: If (S < 0.5), when the random values fall in the middle of [0,1] and <0.5, this case is considered. The wide region searching capability is improved to the extended global search and making the solution wider to find the best optimal solution. This uses a velocity factor to speed up the exploration phase to find the best regions correspondingly. Therefore, the characteristics of a positional update can be expressed as: (38) Fit+1=veit+1+Fit F_i^{\left( {t + 1} \right)} = ve_i^{\left( {t + 1} \right)} + F_i^{\left( t \right)} where the previous iteration solution’s position is denoted as Fit+1 F_i^{\left( {t + 1} \right)} , the current iteration solution position is denoted as Fit F_i^t and the wide search parameter is denoted by: (39) Figt=Fi,qtexpiκtmax F_{ig}^t = F_{i,q}^t\exp \left( {{{ - i} \over {\kappa \cdot t\max }}} \right) where κ is the constant parameter, and the velocity parameter can be expressed as: (40) veit+1=veit+Fit+1Fbesttfr ve_i^{\left( {t + 1} \right)} = ve_i^{\left( t \right)} + \left( {F_i^{\left( {t + 1} \right)} - F_{best}^{\left( t \right)}} \right) \cdot fr where fr represents the frequency parameter, and it can be expressed as: (41) fr=frmin+frmaxfrminρ fr = f{r_{\min }} + \left( {f{r_{\max }} - f{r_{\min }}} \right)\rho where ρ ∈ (0,1) it is a random vector which is taken from the uniform distribution. frmin,frmax Is the minimum and maximum frequency of pulse emissions. Subcase (ii): Random Search: If Ra ≥ 0.5 Here, search variables fall among the values (0, 1), where it is higher than or the same value as 0.5, then this case will be implemented. The updated position will be mathematically expressed as: (42) Fit+1=FrandtDisw F_i^{\left( {t + 1} \right)} = F_{rand}^{\left( t \right)} - Dis \cdot w where Dis refers to the distance between the target and solution, and w denotes the iteration period. Therefore, the iterations and the distance are represented as: (43) Dis=2ΦFrandtFit Dis = \left| {2\Phi \cdot F_{rand}^{\left( t \right)} - F_i^{\left( t \right)}} \right| (44) w=21yymax w = 2\left( {1 - {y \over {{y_{\max }}}}} \right) where the random integer with the value of [0,1] is expressed as Φ, and the maximum number of iterations and the current iteration are represented as y, ymax. Furthermore, the updated new position equation is denoted by: (45) Fit+1=κ1Re*RezFitFi+Fit+1κFrandtDisw F_i^{t + 1} = \kappa \left[ {{1 \over {{\mathop{\rm Re}\nolimits} }}*\left( {{\mathop{\rm Re}\nolimits} \; - \left| z \right|} \right)Fit\left( {{F_i}} \right) + F_i^{\left( t \right)}} \right] + \left( {1 - \kappa } \right)\left[ {F_{rand}^{\left( t \right)} \cdot Dis\;w} \right] where κ act as a hybridization factor, which is in the range of (0, 1), Re is the alternative index. By the above Eq. (43), the positions are updated, and it traverses the current search space, which improves the chance of getting Persistent local optima are reduced.

    explores the new search space, and the chances of getting stuck in the local optima are minimized.

  • Case (ii)-Exploitation Phase: If PaFit \left( {Pa \ge F_i^t} \right) , this phase will be introduced where the randomly created value is superior to or equal to pulse emissions. The updation of the solution depends on the average loudness and the randomness of the solution. The slower convergence rate and the exploitation phase issues are overcome by generating a new solution and which is represented by: (46) Fit+1=Fit+ϖAvgt+1FopttFitcos2πR5+Foptt \matrix{ {F_i^{\left( {t + 1} \right)} = \partial \left[ {F_i^{\left( t \right)} + \varpi \cdot Av{g^{\left( t \right)}}} \right]} \hfill \cr {\;\;\;\;\;\;\;\;\;\; + \;\left( {1 - \partial } \right)\left[ {\left| {F_{opt}^{\left( t \right)} - F_i^{\left( t \right)}} \right| \cdot {{\cos}}\left( {2\pi {R_5}} \right) + F_{opt}^{\left( t \right)}} \right]} \hfill \cr } where Avg(t) implies the average of all learning rate parameters, denotes the hybridization parameter in the range of [0,1], ϖ denotes the random integer in between the value of [−1, 1], Foptt F_{opt}^{\left( t \right)} is the current iteration optimal solution, and R5 denotes the arbitrary integer with the value of −1 and 1. Eq. (44) can be updated through the quick movement by considering the random variable that is greater than the threshold value, which enables the solution to quickly reach the optimal region of the exploitation phase. Therefore, the updated solution can be mathematically represented by, (47) Fit+1=Fit+AB+Fit+ϖAvgt+1FopttFitcos2πR5+Foptt \matrix{ {F_i^{\left( {t + 1} \right)} = F_i^t + {\rm{A}} \cdot {\rm{B}} + \partial \left[ {F_i^t + \varpi Av{g^{\left( t \right)}}} \right]} \hfill \cr {\;\;\;\;\;\;\;\;\;\; + \;\left( {1 - \partial } \right)\left[ {\left| {F_{opt}^{\left( t \right)} - F_i^{\left( t \right)}} \right|} \right] \cdot \cos \left( {2\pi {R_5}} \right) + F_{opt}^{\left( t \right)}} \hfill \cr } where A denotes a random number that obeys the normal distribution, and B represents a matrix, and each element inside is 1. Therefore, the introduction of the quick movement factor within the exploitation phase enables the solution to find a better optimal region. Hence, the SEnO algorithm finds the best solution by searching and exploring the location. Thus, the computational complexity and the best solution delivered within the minimum time by the algorithm.

d)
Termination

The SEnO algorithm halts execution once a predefined condition is met, indicating convergence, and it then returns the most optimal solution. The flowchart outlining the SEnO algorithm’s steps is shown in Figure 8.

Figure 8:

The flowchart of the SEnO algorithm.

V.
Results and Discussion

This section describes the results obtained by the SEnO-DTCBiNet model, the evaluation metrics utilized to evaluate the SEnO-DTCBiNet model, the dataset description, and the experimental setup.

a.
Experimental setup

The SEnO-DTCBiNet framework was developed using Python 3.7 on a Windows 11 platform. The system setup includes a 13th-generation Intel Core i7-13770K processor, 16 GB of RAM, and an Nvidia GeForce RTX 3080 Ti GPU with 12 GB of dedicated memory. Development was carried out using the PyCharm Community Edition, with storage handled by a 128 GB ROM. The hyperparameters used by the SEnO-DTCBiNet model for classification are displayed in Table 3.

Table 3:

Hyperparameters of the SEnO-DTCBiNet model

HyperparametersValues
Kernel size(3 × 3)
Pooling(2,2)
Convolution 2D layers2
Activation functionReLU
Learning rate0.02
Dropout rate0.5
No. of BiLSTM layers2
LSTM Units64
Loss functionCategorical-cross entropy
OptimizerAdam
Number of epochs100
PoolingMaxPooling2D
MetricsAccuracy
PaddingSame
Stride size2

SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network.

b.
Dataset description

Datasets included in the research are described below:

  • BT dataset [17]: It consists of three different kinds of BT with 3064 images. Moreover, the collected images are split into four subgroups, and they are stored in 4.zip files, respectively.

  • BraTS 2020 dataset [18]: It comprises MR images from 369 patients, and four types of MR images were gathered from each patient for the training set. There are four kinds of images present in this dataset.

  • Moreover, the validation set consists of MR images of 125 patients, and these images are collected from the various 19 institutes with several clinical protocols.

BraTS 2018 dataset [19], the training set comprises 285 patients; among them, 210 patients have high-grade glioma and 75 patients have low-grade glioma. Likewise, the validation set includes 66 patients with unknown-grade BTs. The dataset visualization is illustrated in Figure 9.

Figure 9:

Dataset visualization. (A) BraTS 2020 dataset, (B) BT dataset, and (C) BraTS 2018 dataset. BT, brain tumor.

c.
Evaluation metrics

Precision: It measures the percentage of the SEnO-DTCBiNet model’s all positive classifications, and it is denoted as: (48) pre=tepsteps+flps pre = {{t{e_{ps}}} \over {t{e_{ps}} + f{l_{ps}}}} where pre indicates the precision measure, the true positive is represented as teps, and the false positive is denoted as flps.

Recall: It measures the percentage of all positives, which are classified correctly by the SEnO-DTCBiNet model as positive, and it is represented as: (49) re=tepsteps+flng re = {{t{e_{ps}}} \over {t{e_{ps}} + f{l_{ng}}}} where flng is the false negative, and re indicates the recall measure.

Accuracy: It measures the SEnO-DTCBiNet model’s overall correctness in classifying data, and it is represented as: (50) acc=teps+tengteps+teng+flps+flng acc = {{t{e_{ps}} + t{e_{ng}}} \over {t{e_{ps}} + t{e_{ng}} + f{l_{ps}} + f{l_{ng}}}} where accuracy is denoted as acc, and the true negative is presented as teng.

Fi-score: It estimates the performance of the SEnO-DTCBiNet model by the harmonic mean of recall and precision, which is denoted as: (51) F1=2×pre×repre+re F1 = 2 \times {{pre \times re} \over {pre + re}} where F1-measure is signified as F1.

d.
Experimental results

The image results of the SEnO-DTCBiNet model across the BraTS 2018 dataset, BraTS 2020 dataset, and the BT dataset are represented in Figures 10–12, respectively.

Figure 10:

Image results based on the BraTS 2020 dataset. ROI, region of interest.

Figure 11:

BT dataset-based image results. BT, brain tumor; ROI, region of interest.

Figure 12:

BraTS 2018 dataset-based image results. ROI, region of interest.

e.
Performance analysis

The model performance across Brats2020, BT dataset, and the Brats2018 for different amounts of training data and epochs is demonstrated in this section.

e.i
BraTS 2020 dataset-based performance analysis

Figure 13 represents the SEnO-DTCBiNet model performance on the BraTS 2020 dataset, assessed using training data. The precision value of the SEnO-DTCBiNet model at epoch 100 with the training percentage of 50%, 70% and 90% is 94.21%, 95.87%, and 97.30%, respectively. The SEnO-DTCBiNet model’s recall value at epoch 60 is 94.45% and at epoch 100 is 97.88%. Likewise, the accuracy and F1-score value achieved by the SEnO-DTCBiNet model at epochs 20 to 100 with 90% of training data are 88.59%, 90.00%, 92.49%, 93.59%, 97.17% and 87.06%, 87.36%, 88.03%, 88.27% and 97.25%, respectively.

Figure 13:

SEnO-DTCBiNet model performance on the BraTS 2020 dataset. SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network.

e.ii
BT dataset-based performance analysis

Figure 14 represents the proposed SEnO-DTCBiNet model’s effectiveness utilizing the BT dataset. Precision value obtained by the SEnO-DTCBiNet model at epochs 100, 60, and 20 for 90% of the training data is 96.86%, 92.58%, and 91.10%, respectively. Similarly, the recall of the SEnO-DTCBiNet model at epoch 40 is 92.31% and then it increases to 93.84% at epoch 80, and lastly it reaches 98.72% at epoch 100, respectively. The SEnO-DTCBiNet model’s accuracy at epochs 20–100 is 92.53%, 93.15%, 94.33%, 95.50%, and 98.48%. The SEnO-DTCBiNet model achieved an F1-score of 96.54%, 97.78%, and 97.78% at epoch 100, corresponding to training percentages of 60%, 80%, and 90%, respectively.

Figure 14:

Performance of the SEnO-DTCBiNet model using the BT dataset. BT, brain tumor. SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network.

e.iii
BraTS 2018 dataset-based performance analysis

The SEnO-DTCBiNet model performance is evaluated under varying training percentages is presented in Figure 15. The precision of the SEnO-DTCBiNet model at epochs 20–100 is 87.45%, 91.38%, 93.50%, 93.75%, and 94.16% for the training percentage of 90%. Similarly, recall and accuracy values obtained by the SEnO-DTCBiNet model at epochs 60 and 100 with 90% of training data are 90.46%, 97.42% and 92.29% and 97.76%, respectively. The SEnO-DTCBiNet model F1-score at epoch 20 is 86.18% and then it rises to 89.83% at epoch 60, and finally it reaches 95.76%, respectively.

Figure 15:

Performance of SEnO-DTCBiNet model using the BraTS 2018 dataset. SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network.

f.
Comparative methods

The comparative approaches are CJHBA-DRN [4], EDN-SVM [11], 2D-CNN-CAE [15], PatchResNet [7], Bat-Algorithm-based multihead convolutional bidirectional memory (BA-MCBM) [29], Dolphin swarm algorithm-based multihead bidirectional convolutional memory (DE-MCBM) [30], multihead sonar prey optimized convolutional bidirectional memory (SPO-MCBM), and SSA-based distributed tetra head attention-based convolutional bidirectional network (SSA-DTCBiNet) [31].

f.i
Comparative analysis on the BraTS 2020 dataset in terms of training percentage

The SEnO-DTCBiNet model performance against several traditional approaches on the BraTS 2020 dataset is presented in Figure 16. The precision of the SEnO-DTCBiNet model is 97.30%, which shows that it is higher than the EDN-SVM by 12.31%, 2D-CNN-CAE by 8.16%, CJHBA-DRN by 10.25%, BA-MCBM by 7.28%, Dense-CNN by 14.27%, PatchResNet by 9.66%, DE-MCBM by 6.36%, SSA-DTCBiNet by 6.82%, and SPO-MCBM by 1.86% for 90% of the training data. The SEnO-DTCBiNet model achieved a recall value of 97.88%, which indicates an enhancement by 5.06%, 2.45%, 4.61%, 1.85%, 6.11%, 4.24%, 1.68%, 1.76%, and 0.11 for the EDN-SVM, 2D-CNN-CAE, CJHBA-DRN, BA-MCBM, Dense-CNN, PatchResNet, DE-MCBM, SSA-DTCBiNet, and SPO-MCBM, respectively. The accuracy of the SEnO-DTCBiNet model is 97.17%, and it is improved by 10.05% with EDN-SVM, 6.63% with 2D-CNN-CAE, 8.59% with CJHBA-DRN, 6.10% with BA-MCBM, 11.29% with Dense-CNN, 7.49% with PatchResNet, 3.59% with DE-MCBM, 4.85% with SSA-DTCBiNet, and 0.65 with SPO-MCBM for 90% of the training data. Likewise, the SEnO-DTCBiNet model attained an F1-score of 97.25%, which is an improved value over other comparative approaches.

Figure 16:

SEnO-DTCBiNet model performance comparison on the BraTS 2020 dataset. SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network.

f.ii
K-fold cross-validation using the BraTS 2020 dataset

The SEnO-DTCBiNet model performance against the other conventional models based on K-fold l using the BraTS dataset is shown in Figure 17. K-fold cross-validation is performed for KF-4 to KF-10 against the baseline models, with the proposed SEnO-DTCBiNet. The SEnO-DTCBiNet model achieved 96.27% of accuracy, which is an improved value than Dense-CNN and EDN-SVM, by 8.92% and 6.31% at a fold of 10. Likewise, in the precision, the proposed model attains a higher value of 95.53%, showing improvement over 5.99% by CJHBA-DRN and 3.44% by PatchResNet. The recall value achieved by the SEnO-DTCBiNet framework is 97.74% showing an enhancement of 3.69% over 2D-CNN-CAE and 2.52% over BA-MCBM, respectively. The F1-score of the SEnO-DTCBiNet model is 96.62%, which is greater than DE-MCBM by 1.18%, SSA-DTCBiNet by 0.32%.

Figure 17:

K-fold analysis on the BraTS 2020 dataset.

f.iii
Training percentage-based comparative analysis using the BT dataset

The comparative results of the SEnO-DTCBiNet model against the baseline model based concerning training Percentage on the BT dataset is presented in Figure 18. The SEnO-DTCBiNet model obtained a precision of 96.86%, which is also higher than other comparative EDN-SVM, 2D-CNN-CAE, CJHBA-DRN, BA-MCBM, Dense-CNN, PatchResNet, DE-MCBM, SSA-DTCBiNet, and SPO-MCBM approaches. The recall of the SEnO-DTCBiNet model is 98.72%, and it is improved over 6.29% with EDN-SVM, 4.03% with 2D-CNN-CAE, 5.83% with CJHBA-DRN, 2.93% with BA-MCBM, 7.06% with Dense-CNN, 5.65% with PatchResNet, 2.61% with DE-MCBM, 1.40% by SSA-DTCBiNet, and 0.20 by SPO-MCBM. The SEnO-DTCBiNet model achieved an accuracy of 98.48%, which shows that it is higher than EDN-SVM by 9.57%, 2D-CNN-CAE by 6.80%, CJHBA-DRN by 7.46%, BA-MCBM by 6.76%, Dense-CNN by 10.06%, PatchResNet by 7.24%, DE-MCBM by 4.23%, SSA-DTCBiNet by 2.40%, and SPO-MCBM by 0.57 for 90% of the training data. The F1-score of SEnO-DTCBiNet model is 97.985, which displays the enhancement by 8.62%, 5.72%, 8.40%, 4.96%, 9.89%, 7.08%, 2.78%, 1.58%, 0.38 for the EDN-SVM, 2D-CNN-CAE, CJHBA-DRN, BA-MCBM, Dense-CNN, PatchResNet, DE-MCBM, SSA-DTCBiNet, and SPO-MCBM, respectively.

Figure 18:

Comparison of the SEnO-DTCBiNet model based on the training percentage using the BT dataset. BT, brain tumor. SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network.

f.iv
K-fold cross-validation using BT dataset

The proposed SEnO-DTCBiNet model’s performance against the other traditional models, based on the K-fold using the BT dataset, is depicted in Figure 19. The proposed SEnO-DTCBiNet model achieved 96.23% of accuracy, showing improved performance over Dense-CNN by 7.18% and EDN-SVM by 4.92%. Additionally, its precision reached a high value of 95.49%, outperforming CJHBA-DRN by 4.65% and PatchResNet by 4.31%. The SEnO-DTCBiNet framework also demonstrated a superior recall value of 97.71% representing an enhancement of 0.66% and 0.33% over 2D-CNN-CAE and BA-MCBM. Furthermore, the model’s F1-score reached 0.81% at fold-10, exceeding DE-MCBM by 1.06% and SSA-DTCBiNet by 0.81%. The proposed SEnO-DTCBiNet framework significantly advances BT segmentation by achieving improved performance. This makes it a more effective tool for analysis compared to the conventional methods.

Figure 19:

K-fold analysis on the BT dataset. BT, brain tumor.

f.v
Comparative analysis based on training percentage using BraTS 2018 dataset

Figure 20 presents the performance comparison of the proposed SEnO-DTCBiNet model which attained of 94.16%, which shows a development by 4.47%, 1.48%, 4.32%, 1.15%, 5.12%, 2.66%, 1.08%, 0.70, and 0.28 for the EDN-SVM, 2D-CNN-CAE, CJHBA-DRN, BA-MCBM, Dense-CNN, PatchResNet, DE-MCBM, SSA-DTCBiNet, and SPO-MCBM. The SEnO-DTCBiNet model recall and accuracy with 90% of training data are 97.42% and 97.76% respectively. The F1-score of SEnO-DTCBiNet model is 95.76%, which is greater than the EDN-SVM by 5.73%, 2D-CNN-CAE by 3.00%, CJHBA-DRN by 4.57%, BA-MCBM by 2.29%, Dense-CNN by 6.29%, PatchResNet by 3.68%, DE-MCBM by 1.90%, SSA-DTCBiNet by 1.35%, and SPO-MCBM by 0.34.

Figure 20:

Comparison of the SEnO-DTCBiNet model based on training percentage using the BraTS 2018 dataset.

f.vi
K-fold cross-validation using the BraTS 2018 dataset

The model evaluation on the K-fold cross-validation against the baseline models across the BraTS 218 dataset is depicted in Figure 21. Contrasting it with conventional approaches. With a K-fold 10, the SEnO-DTCBiNet model achieved an accuracy of 95.51%, exceeding that of Dense-CNN and EDN-SVM by 6.70% and 5.96% respectively. The precision also registered a higher value of 95.37%, showing an improvement of 3.68% over CJHBA-DRN and 2.45% over PatchResNet. The SEnO-DTCBiNet framework’s recall reached 95.79%, outperforming 2D-CNN-CAE by 3.77% and BA-MCBM by 2.75%. Finally, at fold-10, the model secured an F21-score of 96.62%, which is 1.18% greater than DE-MCBM and 0.32% greater than SSA-DTCBiNet. By consistently beating multiple baseline models across different K-folds, the SEnO-DTCBiNet model demonstrates superior performance and robustness.

Figure 21:

K-fold based on the BraTS 2018 dataset.

f.vii
Comparative analysis based on training percentage using the real-time dataset

The SEnO-DTCBiNet model performance evaluation across the other baseline models using the real-time dataset is presented in Figure 22. The SEnO-DTCBiNet model achieved 96.35% accuracy, performing better than the Dense-CNN by 5.40%, EDN-SVM by 4.46%, and CJHBA-DRN by 4.06% when trained on 90% of the data. The model attains a precision rate of 95.98%, showing improvements over other models such as 2D-CNN-CAE, BA-MCBM, and PatchResNet by 2.10%, 1.81% and 2.82% respectively. The SEnO-DTCBiNet model achieved a recall value of 97.10%, which is 1.25% better than DE-MCBM and 0.21% better than SSA-DTCBiNet. Additionally, the F1-score of the model is 96.54%, with a gain over SPO-MCBM of 0.38%. Overall, these results indicate that the SEnO-DTCBiNet model offers an effective performance enhancement, particularly in the context of BT classification, when compared to conventional models.

Figure 22:

Comparison of the SEnO-DTCBiNet model on the real-time dataset concerning training percentage. SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network.

f.viii
K-fold cross validation using the real-time dataset

Figure 23 represents the SEnO-DTCBiNet model outcomes across the traditional methods based on the training Percentage of the real-time dataset. the SEnO-DTCBiNet model achieved an accuracy of 96.47%, surpassing Dense-CNN by 8.78%, and EDN-SVM by 6.23%, on K-fold 10. Furthermore, the model demonstrated a high precision of 95.78%, outperforming approaches such as 2D-CNN-CAE, BA-MCBM, and PatchResNet by 2.38%, 2.33% and 3.52%. In recall, the SEnO-DTCBiNet reached 97.87%, which is 0.54% higher than DE-MCBM and 0.43% above SSA-DTCBiNet. Its F1-score also shows a higher value, 96.81% presenting an improvement over SPO-MCBM by 0.80%. These results highlight the SEnO-DTCBiNet framework as a significant advancement in performance, particularly in BT classification compared to other conventional models.

Figure 23:

K-fold analysis on the real-time dataset.

g.
ROC analysis

Figure 24 portrays a Receiver Operating Curve of the SEnO-DTCBiNet framework against the other comparative models, including CJHBA-DRN, BA-MCBM, 2D-CNN-CAE, Dense-CNN, EDN-SVM, PatchResNet, DE-MCBM, SSA-DTCBiNet, and SPO-MCBM methods based on three datasets. Based on this ROC analysis, it indicates that the SEnO-DTCBiNet model performs significantly to the other comparative methods.

Figure 24:

ROC analysis. (A) BraTS 2020 dataset, (B) BT dataset, and (C) BraTS 2018 dataset. BT, brain tumor.

h.
Time complexity analysis

The Time Complexity of the SEnO-DTCBiNet model against other existing methods using the BraTS2019, BT, and the BraTS2020 datasets is depicted in Figure 25. The analysis reveals that the SEnO-DTCBiNet model requires less time for tumor classification as the number of iterations increases, outperforming the comparative approaches.

Figure 25:

Time complexity analysis. (A) BraTS 2020 dataset, (B) BT dataset, and (C) BraTS 2018 dataset. BT, brain tumor.

i.
Convergence analysis

The convergence analysis of the SEnO optimization compared to approaches such as RMsProp, Adam, BA, SSA, DEA, and SPO is depicted in Figure 26. The analysis indicates that SEnO’s loss value decreases more effectively than its counterparts as the number of epochs increases. At epoch 50, the SSA and DEA methods reach a loss of 6.73 × 10−07, 3.13 × 10−06 whereas the proposed optimization reaches a lower loss of 1.00 × 10−06, respectively. Similarly, at epoch 99, the proposed optimization remains the lower loss value of 1.00 × 10−06, whereas the existing approaches, RMsProp, Adam, and BA reach a loss of 0.630, 0.002, and 0.001. These findings therefore demonstrate the effectiveness of the SEnO optimization in achieving an optimal solution for classifying the BT.

Figure 26:

Convergence analysis. SSA, Sparrow Search Algorithm.

j.
Computational complexity analysis in terms of floating-point operations per second (FLOPS)

Figure 27 presents the computational complexity of the proposed brain tumor detection model and various other methods, using FLOPS as the single arithmetic operation, such as addition or multiplication, performed on two floating-point numbers. At epoch 50, existing methods exhibit high FLOPS requirements, such as the Dense-CNN report 1.59 × 1008 and EDN-SVM report 1.45 × 1008, whereas the proposed model utilizes fewer FLOPs of 1.42 × 1008. Similarly, at epoch 99, the SSA-DTCBiNet, SPO-MCBM uses fewer FLOPs of 2.08 × 1008; in comparison, the proposed SEnO-DTCBiNet model significantly reduces the computational complexity to 2.05 × 1008 FLOPS through the SEnO algorithm, which significantly configures the model to enhance classification performance.

Figure 27:

Computational complexity analysis in terms of FLOPS. FLOPS, floating-point operations per second.

k.
Ablation study for model components

As described in Figure 28, ablation is performed to evaluate the individual contributions of various architectural components in the proposed model. Starting with a BiLSTM, the model achieved 93.68% accuracy, effectively capturing temporal dependencies. Incorporating the SEnO optimizer with CNN shows improved performance to 93.96%, showing the benefit of optimized weight updates in the convolutional network. The CBiNet model, individually reached 94.64%, while the combination of SEnO + BiLSTM yielded 95.24%, demonstrating enhanced training stability. Adding the Tetra Head Attention to CBiNet further increases the performance to 95.74%, and SEnO + CBiNet reached 96.35% by unifying optimization and hybrid architecture. Finally, the complete SEnO-DTCBiNet model achieved an accuracy of 97.18%, validating the effectiveness of combining CNN, BiLSTM, SEnO optimization, and multihead attention in one unified framework. These results confirm that each component contributes to a layered improvement in classification, with the fully integrated model delivering the most robust performance.

Figure 28:

Ablation study for SEnO-DTCBiNet model components. SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network.

l.
Ablation study for the attention mechanism

An ablation study focused on the impact of individual attention mechanisms and their combinations is depicted in Figure 29. When used independently, BAM, SE, ULSAM, and CC achieved an accuracy of 87.57%, 85.68%, 91.55%, and 92.54% respectively, each excelling in different aspects, such as channel recalibration and spatial attention. Combining BAM + SE improved accuracy to 95.55%, and CC + BAM + SE achieved 94.55%, reflecting the complementary effect of combining spatial and channel attention. The trio of ULSAM + SE + CC achieved 96.46% of accuracy, showcasing the synergy between cross-channel, lightweight, and contextual attention mechanisms. Ultimately, the full DTHA configuration combining BAM, SE, ULSAM, and CC achieved the peak accuracy of 97.18%, proving that fusing diverse attention modules significantly boosts feature representation. These attention mechanisms, when integrated into the proposed model, enhance fine-grained tumor localization and classification, making SEnO-DTCBiNet a highly effective tool in classifying the BT.

Figure 29:

Ablation study for attention mechanisms. CC, criss-cross; DTHA, distributed tetra head attention; SE, squeeze and excitation.

m.
Confusion matrix

The SEnO-DTCBiNet model confusion matrix, across the BraTS 2020, BT, and the BraTS 2018 datasets is represented in Figure 30. This matrix serves to validate the model’s performance by comparing its classification against the actual labels.

Figure 30:

Confusion matrix.

n.
Statistical analysis

The statistical Analysis for the proposed SEnO-DTCBiNet model across the BraTS 2020 is presented in Table 4, across the BT is presented in Table 5, for the BraTS 2018 is shown in Table 6, and the real-time dataset is presented in Table 7.

Table 4:

BraTS 2020 dataset-based statistical analysis

Methods/metricsPatchRes NetBA-MCBMDense-CNNCJHBA-DRNEDN-DRN2D-CNN-CAESPO-MCBMSSA-DTCBiNetDE-MCBMSEnO-DTCBiNet
AccuracyBest92.5893.9187.6890.8590.1993.0594.8995.9296.1696.27
Mean87.4088.5984.8186.4985.7887.9689.3390.2490.7092.34
Variance14.6513.724.2710.559.0312.6314.0515.2813.5010.02
Standard deviation3.833.702.073.253.013.553.753.913.673.16

PrecisionBest92.2493.2287.5789.8189.3392.5093.6595.1195.3995.53
Mean87.0287.8084.9285.9885.6987.2888.5089.4689.8990.51
Variance13.9414.224.097.567.3114.0412.4514.8513.3112.81
Standard deviation3.733.772.022.752.703.753.533.853.653.58

RecallBest93.2695.2887.8992.9491.9294.1497.3897.5597.6997.75
Mean88.1790.1784.5987.5085.9489.3491.0091.7892.3195.99
Variance16.3413.514.7618.4113.7610.4317.6816.2613.957.93
Standard deviation4.043.682.184.293.713.234.204.033.742.82

F1-scoreBest92.7594.2487.7391.3590.6193.3195.4896.3196.5396.63
Mean87.5988.9684.7686.7385.8188.2989.7390.6191.0993.16
Variance15.0313.604.3712.1910.0112.0714.8315.5013.609.17
Standard deviation3.883.692.093.493.163.473.853.943.693.03

BA-MCBM, Bat-Algorithm-based multihead convolutional bidirectional memory; CJHBA-DRN, Chronological Jaya Honey Badger Algorithm-based Deep Residual Network; CNN, convolutional neural network; DE-MCBM, Dolphin swarm algorithm-based multihead bidirectional convolutional Memory; SPO-MCBM, Sonar Prey Optimized Convolutional Bidirectional Memory; SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network; SSA-DTCBiNet, Sparrow Search Algorithm-based distributed tetra head attention-based convolutional bidirectional network.

Table 5:

BT dataset-based statistical analysis

Methods/metricsSSA-DTCBiNetDense-CNNBA-MCBMMEDN-SVM2D CNN-CAECJHBA-DRNPatch ResNetDE-MCBMSPO-MCBMSEnO-DTCBiNet
AccuracyBest95.2489.3294.8491.4994.7192.1992.9194.9795.6396.24
Mean90.8985.4489.2186.2188.5187.3587.9490.3691.8093.10
Variance16.569.6619.5814.0719.3114.6314.8117.2914.618.64
Standard deviation4.073.114.433.754.393.823.854.163.822.94

PrecisionBest94.0589.7393.5990.6293.5391.0591.3793.7394.6095.49
Mean90.4185.6488.6186.0688.2687.0387.5990.1191.4493.02
Variance12.5010.4814.2211.8314.3110.839.8212.1211.314.85
Standard deviation3.543.243.773.443.783.293.133.483.362.20

RecallBest97.6388.4997.3493.2397.0794.4996.0097.4797.6997.72
Mean91.8685.0590.4186.4989.0088.0088.6490.8692.5393.27
Variance26.478.2433.8119.7031.9424.1128.3730.9122.6220.79
Standard deviation5.142.875.824.445.654.915.335.564.764.56

F1-scoreBest95.8189.1195.4391.9195.2792.7493.6395.5696.1296.59
Mean91.1285.3489.4986.2788.6287.5088.0990.4791.9893.12
Variance18.719.2722.5215.2922.0316.6817.5920.2016.3711.21
Standard deviation4.333.044.753.914.694.084.194.494.053.35

BT, Brain tumor; CJHBA-DRN, Chronological Jaya Honey Badger Algorithm-based Deep Residual Network; CNN, convolutional neural network; DE-MCBM, Dolphin swarm algorithm-based multihead bidirectional convolutional Memory; SPO-MCBM, Sonar Prey Optimized Convolutional Bidirectional Memory; SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network; SSA-DTCBiNet, Sparrow Search Algorithm-based distributed tetra head attention-based convolutional bidirectional network.

Table 6:

BraTS 2018 dataset-based statistical analysis

Methods/metricsDense-CNNEDN-SVMCJHBA-DRNPatchRes Net2D-CNN-CAEBA-MCBMDE-MCBMSSA-DTCBiNetSPO-MCBMSEnO-DTCBiNet
AccuracyBest89.1189.8191.6092.6293.1793.7594.2294.5994.7895.51
Mean84.8585.6487.0087.8788.9589.3989.9490.2390.9692.12
Variance7.407.5311.5112.3312.9113.8315.5215.2114.4011.72
Standard deviation2.722.743.393.513.593.723.943.903.793.42

PrecisionBest90.3591.2591.8593.0393.6694.0494.5694.8394.9595.37
Mean85.4086.2887.4287.9389.1489.5890.2290.4891.1791.98
Variance10.3210.3011.2312.6313.3513.6015.3914.5715.8514.38
Standard deviation3.213.213.353.553.653.693.923.823.983.79

RecallBest86.6486.9591.1091.8192.1893.1693.5494.1394.4395.80
Mean83.7684.3786.1787.7388.5789.0289.3689.7390.5592.40
Variance3.093.4412.4412.2712.3614.4616.1816.6811.997.68
Standard deviation1.761.853.533.503.523.804.024.083.462.77

F1-scoreBest88.4689.0591.4892.4292.9293.6094.0594.4894.6995.58
Mean84.5785.3186.7987.8388.8589.3089.7990.1090.8692.19
Variance6.066.2311.7112.2312.7313.9615.6415.5713.7410.58
Standard deviation2.462.503.423.503.573.743.953.953.713.25

BA-MCBM, Bat-Algorithm-based multihead convolutional bidirectional memory; CJHBA-DRN, Chronological Jaya Honey Badger Algorithm-based Deep Residual Network; CNN, convolutional neural network; DE-MCBM, Dolphin swarm algorithm-based multihead bidirectional convolutional Memory; SPO-MCBM, Sonar Prey Optimized Convolutional Bidirectional Memory; SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network; SSA-DTCBiNet, Sparrow Search Algorithm-based distributed tetra head attention-based convolutional bidirectional network.

Table 7:

Real-time dataset-based statistical analysis

Methods/metricsDense-CNNEDN-SVMCJHBA-DRNPatch ResNet2D-CNN-CAEBA-MCBMDE-MCBMSSA-DTCBiNetSPO-MCBMSEnO-DTCBiNet
AccuracyBest88.0090.4691.2392.9293.6694.5295.3895.5295.5896.48
Mean84.1785.4586.6887.5988.5989.3490.3190.9091.6893.11
Variance6.3011.0811.8914.7416.7415.9113.4012.309.846.77
Standard deviation2.513.333.453.844.093.993.663.513.142.60

PrecisionBest89.6590.0390.8392.4193.5093.5494.3994.5594.6195.78
Mean84.9485.5586.2987.1288.2088.7089.8390.5490.7192.31
Variance9.449.6910.0413.5717.2113.7010.5010.229.996.81
Standard deviation3.073.113.173.684.153.703.243.203.162.61

RecallBest84.7191.3392.0493.9693.9696.4797.3497.4597.5197.87
Mean82.6285.2587.4788.5489.3690.6191.2591.6293.6294.71
Variance1.9814.5316.8517.5015.9920.9520.6217.649.586.74
Standard deviation1.413.814.104.184.004.584.544.203.092.60

F1-scoreBest87.7390.6191.3592.7593.3194.2495.4896.3196.5396.63
Mean84.7685.8186.7387.5988.2988.9689.7390.6191.0993.16
Variance4.3710.0112.1915.0312.0713.6014.8315.5013.609.17
Standard deviation2.093.163.493.883.473.693.853.943.693.03

BA-MCBM, Bat-Algorithm-based multihead convolutional bidirectional memory; CJHBA-DRN, Chronological Jaya Honey Badger Algorithm-based Deep Residual Network; CNN, convolutional neural network; DE-MCBM, Dolphin swarm algorithm-based multihead bidirectional convolutional Memory; SPO-MCBM, Sonar Prey Optimized Convolutional Bidirectional Memory; SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network; SSA-DTCBiNet, Sparrow Search Algorithm-based distributed tetra head attention-based convolutional bidirectional network.

o.
T-test analysis

A T-test is performed to compare the means of different methods. A p-value of <0.05 indicates a statistically significant difference between the results of the compared models. Table 8 represents the T-test analysis on the BraTS 2020 dataset, Table 9 presents the T-test analysis on the BT dataset, Table 10 depicts the BraTS 2018 dataset based on the T-test analysis, and Table 11 represents the T-test analysis results on the real-time dataset. Thus, the results ensure that the model enhanced BT classification performance across the baseline models under different datasets.

Table 8:

T-test analysis based on the BraTS 2020 dataset

MethodsT-test analysis

AccuracyPrecisionRecallF1-score

p-valueT-statisticp-valueT-statisticp-valueT-statisticp-valueT-statistic
EDN-SVM0.122.200.102.390.161.870.132.11
Dense-CNN0.092.420.092.530.122.190.102.36
CJHBA-DRN0.112.210.112.270.122.130.122.19
PatchResNet0.112.240.122.140.102.400.112.28
BA-MCBM0.132.110.112.250.181.760.132.04
2D-CNN-CAE0.132.070.112.220.191.700.142.00
DE-MCBM0.142.020.132.110.161.860.141.98
SSA-DTCBiNet0.122.200.112.220.122.160.122.19
SPO-MCBM0.112.210.122.170.112.280.112.22
SEnO-DTCBiNet0.082.580.112.240.063.000.072.71

BA-MCBM, Bat-Algorithm-based multihead convolutional bidirectional memory; CJHBA-DRN, Chronological Jaya Honey Badger Algorithm-based Deep Residual Network; CNN, convolutional neural network; DE-MCBM, Dolphin swarm algorithm-based multihead bidirectional convolutional Memory; SPO-MCBM, Sonar Prey Optimized Convolutional Bidirectional Memory; SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network; SSA-DTCBiNet, Sparrow Search Algorithm-based distributed tetra head attention-based convolutional bidirectional network.

Table 9:

T-test Analysis based on the BT dataset

Methods/metricsT-test analysis

PrecisionF1-scoreAccuracyRecall

p-valueT-statisticp-valueT-statisticp-valueT-statisticp-valueT-statistic
DE-MCBM0.092.540.102.360.092.410.112.24
EDN-SVM0.141.980.141.960.141.970.151.92
CJHBA-DRN0.102.370.112.250.112.290.122.17
PatchResNet0.092.420.112.220.112.280.132.08
Dense-CNN0.132.050.122.200.122.150.102.35
BA-MCBM0.102.380.112.250.112.290.122.15
2D-CNN-CAE0.102.310.122.160.122.200.132.04
SSA-DTCBiNet0.082.620.082.560.082.580.092.51
SPO-MCBM0.072.770.072.710.072.730.082.66
SEnO-DTCBiNet0.102.310.072.700.082.620.072.83

BA-MCBM, Bat-Algorithm-based multihead convolutional bidirectional memory; BT, Brain tumor; CJHBA-DRN, Chronological Jaya Honey Badger Algorithm-based Deep Residual Network; CNN, convolutional neural network; DE-MCBM, Dolphin swarm algorithm-based multihead bidirectional convolutional Memory; SPO-MCBM, Sonar Prey Optimized Convolutional Bidirectional Memory; SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network; SSA-DTCBiNet, Sparrow Search Algorithm-based distributed tetra head attention-based convolutional bidirectional network.

Table 10:

BraTS 2018 dataset-based T-test analysis

Methods/metricsT-test analysis

PrecisionAccuracyF1-scoreRecall

p-valueT-statisticp-valueT-statisticp-valueT-statisticp-valueT-statistic
SSA-DTCBiNet0.072.730.072.720.072.720.072.69
EDN-SVM0.132.080.132.090.132.100.132.10
CJHBA-DRN0.102.370.112.230.122.150.151.94
PatchResNet0.112.230.112.290.102.310.102.37
Dense-CNN0.171.800.171.780.181.770.191.68
BA-MCBM0.082.660.082.670.082.670.082.67
DE-MCBM0.072.780.072.750.072.730.082.66
2D-CNN-CAE0.082.640.082.680.072.690.072.72
SPO-MCBM0.072.780.072.720.082.680.082.54
SEnO-DTCBiNet0.072.720.072.730.072.720.082.66

BA-MCBM, Bat-Algorithm-based multihead convolutional bidirectional memory; CJHBA-DRN, Chronological Jaya Honey Badger Algorithm-based Deep Residual Network; DE-MCBM, Dolphin swarm algorithm-based multihead bidirectional convolutional Memory; CNN, convolutional neural network; SPO-MCBM, Sonar Prey Optimized Convolutional Bidirectional Memory; SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network; SSA-DTCBiNet, Sparrow Search Algorithm-based distributed tetra head attention-based convolutional bidirectional network.

Table 11:

Real-time dataset-based T-test analysis

Methods/metricsT-test analysis

PrecisionF1-scoreAccuracyRecall

p-valueT-statisticp-valueT-statisticp-valueT-statisticp-valueT-statistic
PatchResNet0.102.300.102.350.102.340.102.39
EDN-SVM0.112.250.142.030.132.100.171.83
CJHBA-DRN0.112.290.092.510.092.450.082.63
SPO-MCBM0.092.440.102.390.102.410.102.34
2D-CNN-CAE0.102.350.102.340.102.340.102.32
BA-MCBM0.102.320.112.250.112.270.122.19
DE-MCBM0.112.290.122.200.112.230.122.12
SSA-DTCBiNet0.092.440.122.180.112.260.141.96
Dense-CNN0.132.040.141.970.142.000.171.79
SEnO-DTCBiNet0.122.130.122.170.122.160.112.22

BA-MCBM, Bat-Algorithm-based multihead convolutional bidirectional memory; CJHBA-DRN, Chronological Jaya Honey Badger Algorithm-based Deep Residual Network; CNN, convolutional neural network; DE-MCBM, Dolphin swarm algorithm-based multihead bidirectional convolutional Memory; SPO-MCBM, Sonar Prey Optimized Convolutional Bidirectional Memory; SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network; SSA-DTCBiNet, Sparrow Search Algorithm-based distributed tetra head attention-based convolutional bidirectional network.

p.
Comparative discussion

The increase in mortality rate due to BT has led to the development of an effective system that detects and classifies the type of BT based on their origin. However, conventional approaches used in the context provide promising outcomes; their performance is hindered by increased complexities and abnormal performance. The Dense-CNN [1], due to its sophisticated architecture, leads to increased processing time and inherited feature detection capability. While the EDN-SVM [11] improvised by utilizing a machine-learning architecture leveraging its simple structure, the inability to handle structural patterns caused overfitting and computational inconsistencies. The CJHBA-DRN [4] limited its application to recognizing diverse features that are essential for discriminating among the complex brain structures, thus leading to misclassification or diminished performance. The PatchResNet [7] involved different layered patches along with the pretrained models to detect BT; however, the approach lacked optimization of the learning parameters, which led to reduced classification accuracy. The 2D-CNN-CAE [15] leads to overfitting and underfitting issues due to the single-domain data for classification. The BA-MCBM [29] and DE-MCBM [30], due to the individual characteristics of optimization, lead to premature convergence or getting stuck in local optima. The above-mentioned drawbacks are effectively tackled by the proposed SEnO-DTCBiNet by combining the OFA-based DW-Net for precise segmentation of tumor regions, deep BiLSTM layers for contextual understanding of complex structures. Additionally, the SEnO algorithm is used for hyperparameter tuning. These strategies allow the proposed SEnO-DTCBiNet to obtain improved results in classification. The comparison performance of the SEnO-DTCBiNet with other approaches based on Training Percentage and K-fold under the BraTS 2018, BraTS 2020 dataset, BT dataset, and real-time dataset is depicted in Tables 12 and 13.

Table 12:

Results of the SEnO-DTCBiNet and baseline models based on training percentage

Methods/metricsDE-MCBMPatch ResNetEDN-SVMDense-CNN2D-CNN-CAECJHBA-DRNBA-MCBMSSA-DTCBiNetSPO-MCBMSEnO-DTCBiNet
BraTS 2020 datasetPrecision (%)91.1187.9085.3383.4289.3687.3390.2290.6795.4997.30
Recall (%)96.2493.7392.9291.9095.4893.3796.0796.1597.7797.88
Accuracy (%)93.6789.8987.4086.1990.7288.8191.2492.4696.5397.17
F1-Score (%)93.6190.7288.9687.4592.3290.2593.0593.3396.6297.25

BT datasetPrecision (%)94.3989.0486.7385.0990.1286.7490.5795.5596.7196.86
Recall (%)96.1593.1492.5191.7494.7492.9795.8397.3398.5298.72
Accuracy (%)94.3091.3589.0588.5791.7891.1391.8296.1197.9198.48
F1-Score (%)95.26391.0589.5388.2992.3789.7593.12896.4397.6197.98

BraTS 2018 datasetPrecision (%)93.1491.6589.9589.3392.7690.0993.0893.4993.8994.16
Recall (%)94.7592.8290.6090.1593.0192.7294.0695.4597.0397.42
Accuracy (%)95.5994.0589.9188.1694.1391.2194.3095.6996.8097.76
F1-Score (%)93.9492.2390.2789.7492.8891.3993.5694.4695.4395.76

Real-time datasetPrecision (%)94.8093.2791.6290.4593.9691.7394.2495.2995.3195.99
Recall (%)95.8894.5792.9392.5694.8893.8595.1396.8997.0397.10
Accuracy (%)95.1693.7092.0691.1594.2792.4494.5495.8295.8896.36
F1-Score (%)95.3493.9292.2791.4994.4292.7894.6896.0896.1696.54

BA-MCBM, Bat-Algorithm-based multihead convolutional bidirectional memory; BT, Brain tumor; CJHBA-DRN, Chronological Jaya Honey Badger Algorithm-based Deep Residual Network; CNN, convolutional neural network; DE-MCBM, Dolphin swarm algorithm-based multihead bidirectional convolutional Memory; SPO-MCBM, Sonar Prey Optimized Convolutional Bidirectional Memory; SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network; SSA-DTCBiNet, Sparrow Search Algorithm-based distributed tetra head attention-based convolutional bidirectional network.

Table 13:

Results of the SEnO-DTCBiNet model and comparative methods based on K-fold

Methods/metrics2D-CNN-CAEDE-MCBMDense-CNNCJHBA-DRNPatch ResNetEDN-SVMBA-MCBMSSA-DTCBiNetSPO-MCBMSEnO-DTCBiNet
BraTS 2020 datasetPrecision (%)92.5093.6587.5789.8192.2489.3393.2295.1195.3995.53
Recall (%)94.1497.3887.8992.9493.2691.9295.2897.5597.6997.75
Accuracy (%)93.0594.8987.6890.8592.5890.1993.9195.9296.1696.27
F1-Score (%)93.3195.4887.7391.3592.7590.6194.2496.3196.5396.63

BT datasetPrecision (%)93.5393.7389.7391.0591.3790.6293.5994.0594.6095.49
Recall (%)97.0797.4788.4994.4996.0093.2397.3497.6397.6997.72
Accuracy (%)94.7194.9789.3292.1992.9191.4994.8495.2495.6396.24
F1-Score (%)95.2795.5689.1192.7493.6391.9195.4395.8196.1296.59

BraTS 2018 datasetPrecision (%)93.6694.5690.3591.8593.0391.2594.0494.8394.9595.37
Recall (%)92.1893.5486.6491.1091.8186.9593.1694.1394.4395.80
Accuracy (%)93.1794.2289.1191.6092.6289.8193.7594.5994.7895.51
F1-Score (%)92.9294.0588.4691.4892.4289.0593.6094.4894.6995.58

Real-time datasetPrecision (%)93.5094.3989.6590.8392.4190.0393.5494.5594.6195.78
Recall (%)93.9697.3484.7192.0493.9691.3396.4797.4597.5197.87
Accuracy (%)93.6695.3888.0091.2392.9290.4694.5295.5295.5896.48
F1-Score (%)93.7395.8487.1191.4393.1790.6794.9995.9896.0496.82

BA-MCBM, Bat-Algorithm-based multihead convolutional bidirectional memory; BT, brain tumor; CJHBA-DRN, Chronological Jaya Honey Badger Algorithm-based Deep Residual Network; CNN, convolutional neural network; DE-MCBM, Dolphin swarm algorithm-based multihead bidirectional convolutional Memory; SPO-MCBM, Sonar Prey Optimized Convolutional Bidirectional Memory; SEnO-DTCBiNet, sonar energy optimized distributed tetra head attention-based convolutional bidirectional network; SSA-DTCBiNet, Sparrow Search Algorithm-based distributed tetra head attention-based convolutional bidirectional network.

VI.
Conclusion

The SEnO-DTCBiNet framework introduced for BT classification leverages the sophisticated strategies of segmentation and DL techniques, thus ensuring early and timely diagnosis. The framework incorporates OFA-based DW-Net for segmenting the complex brain structures by leveraging the tetra-head attention to discriminate the abstractive features and the SEnO optimization for enhancing the feature representation. Moreover, the deep architecture and the BiLSTM networks enable extracting long-range relationships and the structural patterns more effectively. Additionally, the SEnO algorithm used for tuning the parameters of the model enhances the model performance by finding the optimal solution. The MSMSD for the feature extraction process captures comprehensive tumor characteristics, enhancing classification performance. The integration of these strategies allows SEnO-DTCBiNet to obtain effective outcomes in differentiating tumor types and an automated solution for diagnosis, ensuring early intervention and treatments. During validation of the approach using the BraTS 2020 dataset, the proposed SEnO-DTCBiNet showed superior performance by acquiring an accuracy of 97.18%, precision of 97.31%, recall of 97.89%, and F1 score of 97.26% reflecting its proficiency against traditional approaches. The SEnO-DTCBiNet framework is limited by its opacity and lack of inherent interpretability, as well as the framework’s multicomponent complexity also risks overfitting. Future work will focus on developing and integrating explainable AI techniques that provide clear visual justifications for their decisions, fostering trust and transparency for medical professionals. The future work of the model would focus on integrating feature selection strategies, integrating explainable AI techniques that provide transparency for medical professionals, and the SEnO-DTCBiNet framework could be extended to integrate multimodal images, such as MRI and CT scans, by developing a robust image fusion strategy. This would unlock richer information from different imaging techniques, thereby improving diagnostic accuracy and robustness.

Language: English
Submitted on: Aug 22, 2025
|
Published on: Jan 30, 2026
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Aasha Mahesh Chavan, Vanita Mane, published by Professor Subhas Chandra Mukhopadhyay
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.