IOMT is revolutionizing healthcare by incorporating smart devices and systems, enabling seamless data collection, transmission, and analysis. However, challenges such as bandwidth constraints, data loss, latency, and security concerns hinder efficient transmission of medical data, particularly CT lung images, necessitating innovative solutions [1]. This research introduces an optimized framework using CNN and LSTM models to address these limitations.
Current methods for transmitting CT lung images in IoMT environments face several challenges related to data compression, encryption, and network performance. Lossless compression techniques, while preserving image quality, result in large file sizes, which increase transmission time and place a significant burden on storage systems [2]. On the other hand, lossy compression methods, which achieve smaller file sizes by discarding data, compromise diagnostic accuracy and are unsuitable for medical applications where precision is critical [3]. Encryption methods in current systems, while securing data, often incur computational overhead that introduces latency, especially when dealing with large image files or resource-constrained devices like wearable sensors [4]. Furthermore, IoMT networks frequently experience unstable connectivity, leading to data loss, delays, and increased latency, which are detrimental to time-sensitive medical applications. These challenges highlight the need for optimized frameworks that balance high image quality, secure data transmission, and efficient resource utilization to ensure fast and reliable CT lung image transmission [5].
This research addresses the limitations of existing methods by combining CNNs and LSTMs for efficient and secure transmission of CT lung images. CNNs are highly effective in extracting spatial features and reducing dimensionality, while LSTMs best in modeling sequential dependencies, establishing them as suitable for predictive optimization in data transmission [6]. The proposed model aims to enhance transmission efficiency, ensure reliable image reconstruction, and reduce computational overhead, addressing critical gaps in IoMT-based CT lung image transmission systems[7–9]. The key contributions of this research include:
Development of an optimized model to improve bandwidth utilization and reduce transmission time.
Assurance of high-quality image reconstruction, evaluated through metrics like PSNR and accuracy.
Integration of lightweight encryption mechanisms to enhance data security with minimal computational impact.
Comprehensive evaluation across various metrics such as specificity, sensitivity, encryption time, decryption time, and bandwidth optimization.
The study is organised in the following manner: Section 2 discusses relevant studies and challenges of traditional systems. Section 3 details the recommended CNN-LSTM framework. Section 4 describes the experimental setup and performance measures. Section 5 presents the findings and discussion, concluding with future enhancements.
Balhareth et al. (2024) [10] introduced an optimized intrusion detection system (IDS) for the IoMT networks, aiming to improve the detection accuracy of malicious activities in medical device networks. Their system utilizes tree-based Machine Learning (ML) classifiers integrated with filter-based feature selection methods, including Mutual Information (MI) and XGBoost, to improve performance while reducing computational costs. The IDS monitors unauthorized activities during the transmission of healthcare data and attains 98.79% accuracy with a low false alarm rate of 0.007 FAR on the CICIDS2017 dataset. However, the research focuses solely on binary classification for intrusion identification, with plans for future work to implement a multi-classification approach for attack detection and categorization. Additionally, the proposed feature selection technique needs further empirical evaluation in real-world IoMT scenarios to fully assess its applicability across various machine learning classifiers.
Rahman et al. (2024)[11] conducted a comprehensive survey on the utilisation of ML and DL methods in smart healthcare. The study explored the recent advancements, applications, challenges, and future prospects of these technologies in the healthcare sector. The authors highlighted the significant role of ML and DL in addressing various healthcare challenges, like disease prediction, drug discovery, and medical image analysis. They emphasized the integration of ML and DL techniques to enhance healthcare systems, offering a detailed review of their applications, like ML-healthcare, DL-healthcare, and ML-DL-healthcare. Despite the promising potential, the authors pointed out the challenges related to data privacy, model interpretability, and the demand for large, high-quality datasets.
Nandagopal et al. (2024)[12] constructed a Deep Auto-Optimized Collaborative Learning (DACL) model to identify chronic diseases using Artificial Intelligence (AI) and IoMT networks. This AI-IoMT framework is designed to enhance disease diagnosis by leveraging interconnected sensors in healthcare environments. The DACL model uses a Deep Auto-Encoder Model (DAEM) to preprocess and impute missing data, while the Golden Flower Search (GFS) method is employed to choose optimal features for classification. Furthermore, the Collaborative Bias Integrated GAN (ColBGaN) model is used to classify chronic diseases such as heart disease, diabetes, and stroke from patient medical records. The Water Drop Optimization (WDO) method is applied to minimize classifier error. The effectiveness of the DACL model was demonstrated using various benchmarking datasets, showing superior outcomes in terms of accuracy and efficiency. However, the model's reliance on multiple optimization techniques and complex preprocessing steps may increase computational costs and hinder its scalability in real-time applications.
Datta Gupta et al. (2023)[13] introduced a novel lightweight DL-based approach, "ReducedFireNet," for histopathological image classification in the context of the Internet of Medical Things (IoMT). This model addresses the challenge of timely disease diagnosis, particularly for life-threatening conditions like cancer, by providing real-time, auto-analysis of histopathological images. The study highlights the increasing importance of IoMT in healthcare, offering the potential for effective disease identification and treatment. The proposed model achieved a mean accuracy of 96.88% and an F1 score of 0.968 when evaluated on a real histopathological dataset, showcasing impressive performance in terms of accuracy. Moreover, the model's lightweight design, with a size of only 0.391 MB and a computational requirement of 0.201 GFLOPS, makes it ideal for integration into IoT-based imaging devices. However, despite its promising results, the model's application is limited to histopathological images, and its scalability to other medical imaging domains remains unaddressed, which could be a potential area for further exploration.
Lata, K.; Cenkeramaddi, L.R. (2023)[14] conducted a comprehensive review on DL for medical image cryptography, highlighting the significance of securing electronic health records (EHRs) within the healthcare sector's IOMT systems. They emphasize the increasing complexity of ensuring the privacy, integrity, and availability of EHRs due to the ongoing digital transformation. The paper discusses several imaging modalities such as PET, MRI, ultrasonography, CT, and X-ray, which are crucial for medical diagnosis. These images are processed for tasks like segmentation, feature selection, and denoising, while cryptography techniques are explored as solutions to secure sensitive medical data during storage and transformation. However, the challenges and limitations in adopting DL for medical image cryptography, like computational complexity, the need for large datasets, and the potential trade-offs between security and performance, are also noted as areas for further research.
Thimmapuram et al. (2023)[15] developed a DL-based appproach for medical image classification utilising feature extraction techniques in IoMT. The aim of the study was to utilize Artificial Intelligence (AI) and DL to assess both real-time and historical data from IoT devices, particularly for predicting and identifying serious diseases in the early stages of diagnosis. Medical image classification acts as a significant part in the diagnostic process, but it faces limitations due to the complexity of classifying images based on efficient features. The authors tested their proposed model using MATLAB and achieved impressive results, with 97.71% accuracy on a brain dataset and 97.2% accuracy on an Alzheimer’s disease dataset. Despite the high performance of the algorithm, the study highlights the need for further improvement in terms of accuracy, precision, and computational speed, especially for clinical applications, and proposes the use of soft sets for better convergence in classification tasks. However, the model may still face limitations in real-time deployment due to computational efficiency and the need for more robust feature extraction methods.
Li et al. (2023) [16] provide a comprehensive review on the application of DL techniques in medical image analysis, focusing on their potential to revolutionize healthcare outcomes through real-time analysis of complex datasets. The study categorizes recent DL approaches into five major techniques: CNN, Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), LSTM and hybrid models. The authors critically assess these techniques based on several parameters, including their principles, advantages, limitations, and the datasets used in their implementation. Python emerged as the most commonly used programming language for these methods, with most of the reviewed papers published in 2021, highlighting the recent surge in research. Despite the progress, the review points out significant challenges in the widespread adoption of DL in medical image analysis, including computational complexity, the need for large, high-quality datasets, and issues with generalizability across diverse clinical settings. The evaluation metrics across the studies include accuracy, sensitivity, specificity, F-score, and robustness, but limitations related to the practical implementation and scalability of these methods in real-world healthcare environments remain a major concern.
Tiwari et al. (2022)[17] introduce a brownfield IoMT network for imaging data, designed for scalability based on the objectives, functional requirements, and the number of connected facilities and devices. They developed using DenseNet-201 architecture to generate image descriptors. For classification, an optimized Deep Neural Network model is employed to attain best performance. The model is validated on three publicly available datasets—Brain Tumor MRI, Covid-19 Radiography, and Breast Cancer MRI—and achieves impressive results, with an accuracy of at least 97.28%. Despite its high accuracy and fast convergence rate, the framework's reliance on DenseNet-201 and Differential Evolution may limit its generalization ability and computational efficiency, especially when applied to larger, more complex datasets or real-time systems.
Balaji et al. (2022)[18] propose an optimization-driven deep belief neural network (ODBNN) for medical image classification within IoT-based healthcare systems. This model leverages a combination of ML, DL and AI techniques to address the challenges of high intra-class variance and inter-class similarity inherent in medical image processing. The ODBNN involves image quality enhancement techniques like noise removal and contrast normalization, followed by feature extraction. The May Fly optimization technique is then applied to select the most relevant features for classification. The model is assessed on accuracy, precision, recall, and f-measure, with findings indicating its superiority over conventional techniques. However, the approach’s reliance on multiple optimization and feature extraction techniques could increase computational complexity, potentially making it less efficient for real-time medical image processing applications.
Zhang et al. (2020) [19] proposed a joint DL and IOMT driven architecture for managing and monitoring the massive medical data generated by wearable devices, specifically focusing on cardiac image processing. The authors introduced a self-adaptive power control-based enhanced energy-aware (EEA) framework aimed at reducing energy consumption, enhancing battery lifetime, and improving reliability in wearable devices used for elderly healthcare. Their approach was evaluated using real-time data traces from static (sitting) and dynamic (cycling) activities. Furthermore, the paper proposed a layered architecture for IoMT, a battery model considering wireless channel features and body postures, and optimized network performance by addressing challenges such as energy drain, packet loss ratio (PLR), and sustainability. Experimental results demonstrated that the EEA scheme outperforms conventional constant transmit power control (TPC), significantly enhancing energy efficiency and reliability. Despite the promising results, the paper does not discuss the scalability of the framework in larger, more complex healthcare environments, which remains a key challenge for real-world implementation.
The IoT data analysis workflow begins with raw IoT data collection, followed by a pre-processing stage for data cleaning and normalization. Feature extraction is performed using a dual-model approach combining CNN and LSTM networks, where CNN handles spatial features while LSTM processes temporal patterns[20]. The extracted features undergo optimized model training with hyperparameter tuning for enhanced performance. The trained model generates predictions, which are finally transmitted to cloud infrastructure for storage and deployment. This streamlined workflow ensures efficient processing of IoT data while maintaining scalability and flexibility for various applications.

Architecture for the recommended approach.
The data for this research is collected using CT imaging sensors integrated into wearable Internet of Medical Things (IoMT) devices, designed to monitor and capture lung health data in real-time. These sensors continuously acquire detailed CT lung images, providing critical insights into the structural and functional characteristics of the lungs. The CT imaging process involves emitting low-dose X-rays, which pass through the patient’s chest to create cross-sectional images of the lungs. These images are then transmitted to a central processing unit, where they are digitized for further analysis[21].
In this study, the collected CT lung data is sourced from wearable IoMT-based devices, enabling continuous and non-invasive monitoring of lung conditions. The imaging process ensures high-resolution data acquisition to capture even minor anomalies or changes in lung structure with precision. To maintain the quality and consistency of the data, the CT lung images are pre-processed to eliminate noise and artifacts, such as motion blur or device interference, which may compromise the accuracy of subsequent analysis.
CT lung images are often susceptible to various types of noise and artifacts that can significantly distort the quality and clarity of the images, affecting diagnostic accuracy. Common sources of noise in CT lung images include:
- Motion Artifacts:
Patient movement during image acquisition can lead to blurred or distorted CT images, making it difficult to interpret fine lung structures accurately.
- Respiratory Artifacts:
Breathing during scanning can cause inconsistencies in the image, introducing noise that interferes with identifying abnormalities.
- Beam Hardening:
This artifact occurs due to variations in X-ray attenuation across tissues, resulting in streaks or distortions in the CT image.
- Scattering Noise:
X-rays scattering within the patient’s body can produce noise, reducing image contrast and obscuring subtle features critical for diagnosis.
To address these issues, noise reduction techniques are applied during the pre-processing phase. The goal is to eliminate unwanted disturbances while preserving critical information in the lung images. One of the most widely used techniques for noise reduction in CT lung data is Gaussian Filtering.
Normalization is another critical pre-processing step to ensure CT lung images are on a consistent scale before inputting them into machine learning or deep learning models. CT images may exhibit varying intensity levels due to differences in scanning protocols, patient anatomy, or device settings. These variations can hinder the ability of models to generalize across different datasets or patient populations.
To address this, normalization techniques are applied to standardize the intensity values of CT images, ensuring that all inputs are within a predictable range. One of the most common normalization methods for CT lung images is Min-Max Normalization.
Min-Max Normalization scales the pixel intensity values of CT images to a fixed range, typically [0, 1]. This normalization shifts the minimum intensity to 0 and the maximum to 1, scaling all intermediate values proportionally. For example, if the raw pixel values in a CT lung image range from -1000 Hounsfield units (HU) to 3000 HU, applying Min-Max normalization will transform the pixel values into the range [0, 1]. By normalizing CT lung images, the consistency of data across different patients or imaging systems is ensured, reducing discrepancies and improving the stability of the training process. This leads to better algorithm performance and more reliable predictions.
Feature extraction is a critical step in the analysis and transmission of CT lung images, as it enables the identification and isolation of diagnostic-relevant patterns while minimizing computational overhead. The extracted features holds a significant role in preserving the diagnostic quality of the images during transmission and ensuring efficient processing in IoMT-based systems.
CNNs are a type of DL approach that have shown significant achievement in image and signal processing tasks, like object detection, image classification, and pattern recognition. CNNs are specifically well-suited for tasks that involve spatial data, like medical imaging and time-series analysis, including CT signal processing.
CNNs are developed to automatically learn spatial hierarchies of features from raw input information, reducing the demand for manual feature extraction. This capability makes them highly effecient in analyzing complex data, such as CT signals, where crucial features like heart rate, rhythm, and anomalies must be extracted and identified with high accuracy.
The convolutional layer is the primary part of CNNs. It applies a set of learnable filters (also known as kernels) to the input data. These filters slide over the input signal (or image) and compute the convolution operation at each location, producing feature maps that highlight important spatial patterns in the data. Each filter learns to detect different features, like edges, textures, or more complex patterns, depending on the training process.
In the context of CT signal analysis, convolutional filters are trained to identify key components of the signal, such as the P-wave, QRS complex, and T-wave, which are critical for heart rate detection and the identification of arrhythmias.
After the convolution operation, an activation function is utilised to introduce non-linearity to the model, allowing it to learn complex patterns. The most commonly used activation function in CNNs is the Rectified Linear Unit (ReLU). ReLU transforms all negative values in the feature map to zero, helping the model converge faster and avoid vanishing gradients.
The pooling layer reduces the spatial dimensions of the feature maps, which helps reduce computational complexity and the number of parameters in the model. It also makes the network invariant to small translations in the input data. The most common pooling operation is max pooling, which takes the maximum value in a specific window of the feature map. By down-sampling the feature map, pooling reduces its size while retaining the most important features.

Architecture for CNN
In CT analysis, pooling layers help simplify the learned features by reducing the dimensionality of the signal and maintaining the crucial characteristics for further analysis. Convolutional Neural Networks (CNNs) are a powerful tool for CT signal processing in the context of IoMT applications. Their ability to automatically learn relevant features, detect patterns across different time scales, and handle noisy data makes them a perfect fit for analyzing CT signals and improving heart disease diagnosis, monitoring, and real-time patient care. By integrating CNNs into wearable IoMT devices, healthcare providers can achieve more precise and timely diagnoses, ultimately improving patient results.
LSTM is a type of RNN network developed to overcome the limitatipons of traditional RNNs, particularly the problem of vanishing gradients[22]. LSTMs are specifically efficient in capturing long-range dependencies in sequential data, making them well-suited for tasks like time series prediction, speech recognition, and also in intrusion detection in IoT networks. LSTMs are developed to learn and retain information over long periods of time while discarding irrelevant data. LSTMs achieve this by introducing memory cells and gating mechanisms, which help control the flow of information.

Architecture for LSTM
It controls how much of the previous cell state (Ct-1) should be carried forward to the current cell state. It takes the previous hidden state (ht–1) and the current input (xt) as inputs and outputs a value between 0 and 1, which is then used to scale the previous cell state.
Where ft is the forget gate’s output, Wf is the weight matrix for the forget gate, bf is the bias term and σ is the sigmoid activation function, which squashes the output between 0 and 1.
It determines how much of the new information should be added to the cell state. The candidate cell state represents the potential new information that can be added to the cell state. The input gate is computed as follows:
Where it is the input gate’s output, Wi is the weight matrices for the input gate and bi is the bias terms.
It is updated by combining the previous cell state (Ct–1) and the new candidate cell state
Where Ct is the updated cell state, ft controls how much of the previous cell state is carried forward and it controls how much of the new candidate cell state is added to the cell state.
This gate decides what the next hidden state (ht) will be. The hidden state is based on the updated cell state (Ct), which is passed through the output gate and squashed using the tanh function. The output gate is computed as below:
Where ot is the output gate’s output, ht is the hidden state, Wo is the weight matrix for the output gate, bo is the bias term and the tanh (Ct) function squashes the cell state to ensure that the hidden state is between -1 and 1.
The forget gate (ft) regulates which information from the previous cell state is retained.
The input gate (it) controls what new information is added to the cell state.
The output gate (ot) determines the next hidden state, which is used as the output of the LSTM unit.
The cell state (Ct) is updated by combining the previous state and the new information, allowing the LSTM to remember long-term dependencies.
These gates and states allow LSTMs to effectively remember and forget information over long sequences, making them suitable for tasks such as time series prediction, language modelling, and intrusion detection in IoT networks, where temporal patterns play a crucial role in identifying attacks.
The HO approach is a population-relied optimization technique, individuals in the population are represented as hippo. In this method, every hippopotamus aligns with a possible remedy for the optimization task, with its position update in the search space reflecting the values of the decision variables. This phase, a vector of decision variables is yielded by utilizing a specific formula.
The lower and upper bounds of the j-th decision variable are represented as and
, respectively. Let N be the hippo’s population size, and m depicts count of decision variables in the given challenge. The population matrix is constructed according to the expression in Eq. (7)
Hippopotamus groups contain of various adult female hippos, young calves, multiple adult males, and a influential male, who leads the herd. The influential male is classified by an iterative assessment of the criterion function value (minimization problems correspond to the lowest value, while maximization problems aim for the highest value). Typically, hippopotamuses gather in close quarters. The dominant male holds a key place in safeguarding the herd and its territory against threats. Female hippos are situated near the adult males. Once male hippos reach maturity, the dominant male expels them from the herd. The position of the male hippopotamuses in the water body is mathematically expressed in Equation (8).
In Equation (12), depicts the position of the male hippopotamus, while
refers to the dominant hippopotamus's position. The vector r→1,…,4 denotes random values among 0 and 1, and r5 is another random value among 0 and 1, as shown in Equation (13). I1 and I2 are integers from 1 to 2, as defined in Equations (3) and (6).
represents the mean values of a randomly opted group of hippopotamuses, which has an equal chance of comprising the current hippopotamus (χi), while
1 is another random value among 0 and 1, as shown in Equation (10). In Equation (11), ϱ1 and ϱ2 are random integers that can 1 or 0.
Equations (12) and (13) define the position of female or juvenile hippopotamuses (FB
) within the group. Typically, juvenile hippopotamuses remain close to their mothers; however, due to their curiosity, they may periodically move distant from the group or their mothers. If T exceeds 0.6, this indicates that the juvenile hippopotamus has moved away from its mother. The juvenile is considered to have detached from the herd entirely. The values
1 and
2 are random numbers or vectors opted from one of the 5 scenarios sketched in the
equation. Here, r7 is a random value among 0 and 1. The position update for both male and female or juvenile hippopotamuses within the herd is described in Eqs. (14) and (15).
represents the objective function value.
Using vectors, I1 and I2 case elevate the global search and optimizes exploration in the recommended approach.
Hippopotamuses typically live in groups for their own protection and safety. The size and strength of these groups act as a deterrent to predators, making it less likely for them to approach. Young hippos, driven by curiosity, may occasionally wander away from the safety of the herd. This is because they are not as strong as adult hippopotamuses. Similarly, hippos that are ill are at a higher risk of becoming prey for these predators.
This phase, hippopotamuses could exhibit a trait where they move toward the predator, prompting it to retreat and thereby successfully averting the potential danger. Equation (16) describes the position of the predator within the search space.
Equation (17) represents the distance between the th hippo and the predator. In this phase, the hippo opts a defensive posture, determined by the factor
, which aids in its protection from the predator. When
is smaller than
, it signifies that the predator is dangerously nearer to the hippopotamus. The hippo quickly turns toward the predator and advances toward it, aiming to force the predator to retreat. Conversely, if
is larger, it indicates that the predator or intruder is farther away from the hippo's territory, as described in Eq. (18). In this situation, the hippopotamus still faces the predator and also restricts its movement, signalling its presence within its domain without engaging in an immediate confrontation.
The hippopotamus position, denoted as , refers to a scenario where the animal faces a predator. The vector RL→ represents a random variable following a Lévy distribution, which is utilized to model sudden shifts in the predator's location while an attack on the hippopotamus. The Lévy movement model is expressed by Eq. (19). The terms
and
represent random values within the range [0,1], while ϑ is a constant set to 1.5. The symbol Γ symbolizes Gamma function, and ---iniline-- can be derived from Eq. (20).
In Equation (19), denotes a random value within the range of 2 to 4, cc represents a random number among 1 and 1.5, and D is another random value among 2 and 3. ℊ is a random variable within the interval –1 to 1. r→9 is a vector of random values with dimensions
. In Equation (20), if
exceeds F, it depicts that the hippopotamus has been hunted, prompting the introduction of a new hippopotamus into the herd. Else, the hunter will retreat, and the hippo will rejoin the group. The second phase demonstrated notable enhancements in the global search mechanism. Both the first and second phases operate synergistically, ensuring a robust approach to minimizing the risk of getting stuck in local minima.
When a hippo faces a group of predators or is unable to defend itself, it often attempts to move away from the threat. Typically, the hippopotamus seeks refuge in a nearby lake or pond, as predators like spotted lions and hyenas generally avoid these water bodies. This strategy helps the hippopotamus find a safer location close to its original location. In Phase Three of the HO approach, this behaviour is reflected by enhancing the local search capabilities. To emulate this, a random location near the hippopotamus’s current location is generated. If this new position results in an improved cost function, it suggests the hippopotamus has favorably found a safer spot and has moved to it. Here, depicts the current iteration, and T denotes the maximum number of iterations.
In equation (22), depicts the location of the hippopotamus, which was examined to identify the nearest secure area.
is a randomly used by a vector or value, selected from among three potential options as described in equation (18). These available alternatives --inline-- contribute to a optimal local search, ultimately elevating the overall exploitation performance of the algorithm.
In equation (28), r→11 denotes a stochastic vector with values among 0 and 1. Similarly, r10 (as shown in equation 27) and r13 are random values produced within the 0 to 1 range. Additionally, r12 is a random number following a normal distribution
In this research, Hippo Optimization is used for hyperparameter tuning, which is a metaheuristic algorithm designed to find optimal hyperparameters for deep learning models. Hyperparameters are parameters that are set before training a model and significantly influence the model’s performance, including learning rate, batch size, number of layers, and number of units in each layer. In the context of this study, the aim of hyperparameter tuning is to minimize the discrepancy between predicted accuracy and actual accuracy, ensuring that the model can generalize well on unseen data. This process helps in identifying the best set of hyperparameters that provide optimal performance in terms of accuracy and efficiency.
After feature extraction using CNNs and LSTM networks, hyperparameter tuning is applied in the dense layers of the model. These dense layers play a crucial role in final decision-making, where the extracted features are used to classify the input data. With the help of the Hippo Optimization Algorithm, the hyperparameters of these dense layers, such as the number of units, activation functions, and dropout rates, are optimized. This ensures that the model is not only capable of accurate feature extraction but also able to perform precise and efficient classification based on the tuned parameters.
The absolute solution for the approach was built utilizing an Intel Workstation powered by I7 CPU, NVIDIA GPU, 16GB RAM, and a 3.2 GHz processing frequency.
Evaluation Measures like accuracy, precision, recall, specificity, and F1-score are evaluated to assess the performance of the proposed approach for CT lung image transmission within the IoMT environment. These metrics are benchmarked in real-time against other advanced DL methodsto underscore the advantages of the proposed CNN-LSTM framework. Both the performance metrics and latency overhead are critically analyzed to validate the framework's efficiency. Table 1 provides the mathematical formulas used for calculating these performance metrics.
Performance measures utilized in the examination
| SL.NO | Performance Measures | Expression |
|---|---|---|
| 1 | Accuracy | |
| 2 | Recall | |
| 3 | Specificity | |
| 4 | Precision | |
| 5 | F1-Score |
TP & TN are True Positive & negative, FP & FN are False Positive & negative.
A prediction case can be broken down into four categories: First, we have TP, where the identified values are correct and align with the actual truth. Next, FP refers to instances where the values are incorrectly labelled as true, although they are, in fact, false. FN occurs when a true value is mistakenly identified as negative. Lastly, TN describes the scenario where a value is accurately classified as negative, corresponding to the actual negative status.
The effectiveness of the proposed DL framework for medical image transmission in the IoMT environment was evaluated using several key metrics, like Peak Signal-to-Noise Ratio (PSNR), accuracy, F1 score, specificity, and sensitivity. PSNR was used to assess the quality of the transmitted medical images, where higher values indicated minimal distortion and preserved diagnostic value. Accuracy measured the overall correctness of the model’s predictions, while the F1 score offered a balance between precision and recall, ensuring robust classification even in imbalanced datasets. Specificity and sensitivity evaluated the model’s ability to correctly recognize negative and positive cases, respectively, which is critical in medical diagnostics to avoid false positives and ensure early detection of abnormal conditions. The results demonstrated that the CNN-LSTM framework, optimized using Hippo Optimization for hyperparameter tuning, outperformed traditional models, achieving superior performance across these metrics. This showcases the framework’s effectiveness in ensuring efficient, secure, and reliable medical image transmission in IoMT environments, ultimately contributing to better real-time patient monitoring and care.

Comparative analysis with the Residing approach

Convergence Analysis for the Proposed approach
Performance metrics for the recommended approach
| Algorithms | Performance Metrics (%) | ||||
|---|---|---|---|---|---|
| Accuracy | Precision | Recall | Specificity | F1-score | |
| ANN | 84.3 | 86.3 | 86.5 | 86.5 | 85.3 |
| CNN | 87.2 | 87.2 | 86.6 | 86.3 | 83.2 |
| DNN | 88.5 | 88.1 | 84.3 | 88.6 | 86.3 |
| RNN | 89.9 | 90.1 | 89.74 | 88.3 | 88.2 |
| Proposed Model | 98.3 | 97.0 | 97.8 | 96.2 | 96.0 |
Performance Comparison between the recommended architecture and Traditional Model
| Metric | Proposed Framework | Existing Method (Baseline) |
|---|---|---|
| Compression Ratio | 4:1 | 3:1 |
| Encryption Time (ms) | 12 ms | 18 ms |
| PSNR (dB) | 38.5 | 34.2 |
| Accuracy (Diagnosis) | 98% | 95% |
| Bandwidth Savings (%) | 75% | 60% |
The recommended architecture outperforms the baseline method across key metrics, achieving a higher compression ratio (4:1 vs. 3:1) and improved PSNR (38.5 dB vs. 34.2 dB), ensuring superior image quality. Additionally, it demonstrates faster encryption (12 ms vs. 18 ms), greater diagnostic accuracy (98% vs. 95%), and significant bandwidth savings (75% vs. 60%), highlighting its efficiency and reliability.
This research introduces an optimized deep learning framework for medical image transmission in IoMT environments. The proposed solution incorporates CNN and dimensionality reduction, while LSTM handle sequential data to overcome issues such as packet loss and latency in IoMT networks. The framework also includes robust encryption mechanisms to ensure data security without significantly increasing computational overhead. Performance evaluations using real-world medical image datasets under varying IoMT network conditions demonstrate that the proposed model excels across key metrics, like Peak Signal-to-Noise Ratio (PSNR), accuracy, F1 score, specificity, and sensitivity. Additionally, the model optimizes encryption and decryption times, reducing bandwidth consumption and ensuring efficient and secure data transmission. Future enhancements to improve real-time optimization, incorporate more advanced encryption techniques, and integrate multi-modal data will further refine the framework's performance and applicability. Overall, this study demonstratesa significant step toward improving the reliability and efficiency of medical image transmission in IoMT-based healthcare networks.