Have a personal or library account? Click to login
Ensembled combination of Q-Learning and Deep Extreme learning machine to achieve the high performance and less latency to handle the large IoT and Fog Nodes. Cover

Ensembled combination of Q-Learning and Deep Extreme learning machine to achieve the high performance and less latency to handle the large IoT and Fog Nodes.

Open Access
|Feb 2025

Full Article

1.
Introduction

The Internet of Things (IoT) has revolutionized the way devices and systems interact, enabling seamless connectivity and data exchange across diverse domains. From smart homes and healthcare to industrial automation, IoT facilitates the real-time collection and processing of data, driving innovation and efficiency [13]. Complementing this paradigm is fog computing, an extension of cloud computing that brings computational resources closer to the data sources. Fog computing reduces the dependency on centralized cloud infrastructures, significantly lowering latency and enhancing the performance of time-sensitive applications.

In parallel, the concept of Body Area Networks (BANs) has gained prominence, particularly in healthcare and wearable technology. BANs consist of interconnected sensors and devices operating in close proximity to the human body, enabling continuous monitoring of vital signs, activity levels, and environmental factors [45]. IoT-enabled BANs further enhance this capability by integrating BANs into the broader IoT ecosystem, allowing for remote monitoring, real-time analytics, and adaptive decision-making. These advancements hold immense potential for applications such as telemedicine, fitness tracking, and personalized healthcare.

Body Area Networks (BAN) and IoT working together have greatly advanced task planning, implementation, and execution. IoT-enabled BAN devices, powered by batteries, connect with sensors, microcontrollers, and smartphones to gather patient health data and send it to the cloud via gateways. However, the limited battery life of these hubs makes continuous health monitoring challenging [69]. Fog and edge computing now serve as essential links between cloud systems and BAN-IoT devices, enhancing quality of service (QoS). Fog nodes are integrated into BAN-IoT devices to collect and process data locally, reducing the need for a centralized network [10].

Despite these advancements, the exponential growth of IoT devices and fog nodes presents significant challenges. Managing the vast volume of data generated by IoT-enabled BANs requires intelligent frameworks capable of optimizing resource allocation and ensuring low-latency processing. Traditional methods often struggle to meet the dual demands of high performance and scalability, especially in dynamic and resource-constrained environments[1114].

To address these challenges, this paper proposes an ensembled framework combining Q-Learning and Deep Extreme Learning Machine (DELM). Q-Learning, a model-free reinforcement learning algorithm, excels in making adaptive decisions in uncertain environments, while DELM offers high-speed data processing and robust generalization. The integration of these techniques aims to optimize resource management and decision-making in IoT-enabled BANs and fog networks. The ensembled model leverages Q-Learning for dynamic workload distribution and DELM for efficient data processing, ensuring high performance and minimal latency.

1.2
Contribution of the Research

Motivated by the challenges in managing large-scale IoT and Fog networks, this research makes the following key contributions:

  • A novel framework combining Q-Learning and Deep Extreme Learning Machine (DELM) is proposed, aiming to optimize resource allocation, reduce latency, and improve scalability in complex IoT and Fog environments. This integrated approach leverages the strengths of both methods to address critical performance issues in real-time data processing.

  • Extensive simulations validate the framework’s performance, showing significant improvements in key metrics such as latency, resource utilization, and throughput, demonstrating its potential to handle the challenges of large-scale networks effectively.

  • The proposed framework is demonstrated to be an effective solution for managing large-scale IoT and Fog networks, offering a reliable, efficient, and scalable approach for real-time applications across diverse domains.

1.3
Structure of the Paper

The paper is structured as follows: Section II presents an overview of related works by various authors. Section III details the system design, dataset collection, preprocessing steps, and the proposed architecture. Section IV delves into the experimental setup, results, and performance evaluation. Finally, Section V concludes the paper and outlines potential directions for future improvements.

2.
Related Works

Ramya and Ramamoorthy (2024) [15] developed a Q-based deep extreme neural network learning algorithm for smart healthcare applications, integrating prediction and resource allocation capabilities. Their system architecture consisted of four primary modules: data collection unit (DCU), data processing unit (DPU), intelligent prediction module (IPM), and adaptive resource allocation module (ARAM). The system was specifically designed for heart attack prediction using medical training data from the IoT layer. Through extensive experimentation, they demonstrated improved Quality of Service (QoS) with reduced latency in real-time monitoring scenarios. A notable limitation of the study is its lack of comprehensive comparative analysis with existing prediction systems and insufficient attention to scalability concerns in large-scale deployments.

Abirami and Poovammal (2024) [16] introduced HAWKFOGS, an innovative framework combining deep learning with edge and fog computing devices for cardiac problem diagnosis. Their approach utilized Logistic Chaos based Harris Hawk Optimized Enhanced Gated Recurrent Neural Networks for prediction, implemented on Embedded Jetson Nano devices. Data collection was performed using IoT devices interfaced with electrocardiography and blood pressure sensors. They demonstrated exceptional efficiency with a model building in reduced time. The research is constrained by its focus on specific hardware configurations and lacks validation across diverse healthcare environments and patient populations.

Singh and Singh (2023) [17] explored the convergence of machine learning in robotics with fog/cloud computing and IoT technologies. Their research highlighted how this combination enables intelligent and adaptive autonomous systems through distributed computing and real-time data processing. They addressed critical challenges including security, privacy, resource management, latency, bandwidth, interoperability, and energy efficiency. The study presented case studies demonstrating potential applications across various industries. Their work emphasized the importance of ethical principles and appropriate integration strategies. A significant drawback of the research is the absence of quantitative validation of the proposed solutions and real-world implementation data.

Chakraborty et al. (2023) [18] proposed a lightweight secure framework utilizing Deep Learning for detecting security attacks in IoT applications within fog computing environments. Their ANN-based model was strategically deployed with cloud-based training and fog node-based detection mechanisms. The framework achieved impressive results when tested on NSL-KDD datasets: 99.43% accuracy, 99.26% precision, and a minimal 0.7396% false alarm rate. Comparative analysis showed superior performance against baseline techniques including SVM, Logistic Regression, and Decision Trees. Their approach effectively addressed the security vulnerabilities common in dynamic fog computing environments. One key limitation lies in the framework's validation being restricted to standard datasets, lacking testing against emerging attack patterns and real-world security threats.

Elhadad et al. (2022) [19] developed a healthcare monitoring framework leveraging fog computing for real-time notification management. Their system continuously monitored vital signs including body temperature, heart rate, and blood pressure through wearable devices. The framework incorporated machine learning algorithms for anomaly detection and automated notification generation. They implemented features for both emergency alerts and routine medication reminders. The system demonstrated effective real-time communication between IoT devices and healthcare providers. The study's primary weakness is its lack of comprehensive performance metrics and insufficient attention to potential scalability issues in large-scale healthcare deployments.

Narayana and Patibandla (2021) [20] created a fog-based model (FBELPM) for secure data communication in IoT environments. Their JAVA-implemented system focused on establishing secure communication channels between IoT devices. The model incorporated strong security mechanisms while maintaining efficient data transmission capabilities. They presented a detailed layered approach to IoT-based fog computing implementation. The framework demonstrated effectiveness in reducing attacks while improving supervision quality. A major shortcoming of the research is the absence of quantitative performance analysis and comparison with existing security frameworks in real-world scenarios.

Zhang et al. (2021) [21] addressed distributed fog computing challenges through improved LT codes optimization for deep learning in IoT applications. Their approach focused on managing large-scale sensor data processing while considering potential fog node failures. The improved LTC-based scheme achieved simultaneous reduction in average overhead and degree. Their method demonstrated significant improvements in reducing latency and computation complexity in distributed environments. The research specifically addressed the challenges of processing big data in real-time with limited resources. The primary limitation emerges from insufficient validation in diverse network conditions and inadequate addressing of network heterogeneity impacts.

Abdel-Basset et al. (2021) [22] proposed an energy-aware model based on the marine predators algorithm (MPA) for optimizing task scheduling in fog computing. They developed three versions of the algorithm, with the improved MMPA showing superior performance metrics. Their approach effectively addressed energy consumption, makespan, flow time, and carbon dioxide emission concerns. The system demonstrated particular strength in handling discrete task scheduling challenges. Their research showed significant improvements over genetic algorithms and other metaheuristic approaches. The study falls short in its evaluation under dynamic workload conditions and real-time task arrival scenarios.

Bhandari et al. (2021) [23] developed a deep learning-based content caching method (DLCC) for fog-access points using 2D CNN architecture. Their approach specifically addressed the challenges of predicting and storing user content efficiently in dynamic environments. The system demonstrated superior performance in DL accuracy, mean square error, cache hit ratio, and overall system delay. Their method outperformed existing approaches including transfer learning-based cooperative caching and randomized replacement strategies. The framework showed particular effectiveness in addressing latency-related issues in 5G and beyond cellular communication. A critical gap in the research is its insufficient validation with diverse content types and user behavior patterns in real-world network conditions.

Rahman et al. (2020) [24] introduced a Deep Reinforcement Learning based computation offloading and resource allocation scheme for Fog Radio Access Networks (F-RANs). Their approach intelligently determined optimal task processing locations between device level, fog access points, and cloud servers. The system demonstrated significant improvements in minimizing latency and increasing throughput. Their framework effectively handled the joint optimization of mode selection, resource allocation, and power allocation. The approach showed particular strength in achieving suboptimal solutions for complex F-RAN environments. The research is particularly limited by its assumptions about network stability and insufficient consideration of dynamic user mobility patterns.

3.
Results and discussion

The proposed IoT and Fog node management framework, shown in Figure 1, combines Q-Learning and Deep Extreme Learning Machine (DELM) to address challenges in large-scale IoT and Fog environments. The resource optimization component uses Q-Learning to allocate resources dynamically. By learning and adapting to changing network conditions, it improves resource utilization, reduces latency, and enhances decision-making. The data processing component employs DELM for fast and accurate handling of large-scale data. DELM processes complex datasets efficiently, meeting the real-time demands of IoT and Fog systems while supporting high throughput and scalability. The framework’s performance is validated through simulations representing large IoT and Fog networks. Key metrics like latency, resource usage, and throughput demonstrate significant improvements. By combining Q-Learning’s adaptability with DELM’s speed, the framework ensures low latency, efficient computation, and scalability. This approach highlights its suitability for real-time IoT and Fog applications, providing a reliable and efficient solution for managing large-scale networks.

Figure 1:

Proposed Architecture

3.1
Materials and Methods

The dataset provides a comprehensive configuration of a smart healthcare system that integrates Body Area Networks (BAN), Fog gateways, and cloud connectivity. It includes data from 90 BAN nodes, each operating within a 4–12meter range and initially powered with 0.002 Joules of energy. These nodes utilize 5G transceivers for wireless communication. The dataset captures data transmission metrics, supported by an uplink bandwidth of 250 Mbps and a downlink bandwidth of 125 Mbps, ensuring efficient real-time data transfer. It also outlines the specifications of 4 Fog gateways, each equipped with 3 GB of RAM, functioning as processing hubs for localized computation. The 8 recorded attributes likely represent various health metrics, such as heart rate, temperature, and other physiological indicators. This dataset is essential for analyzing energy consumption, optimizing routing algorithms, and evaluating the performance of the Fog-BAN-Cloud architecture in healthcare applications.

3.2
Data Preprocessing

The data preprocessing for the smart healthcare dataset follows a series of essential steps to ensure its readiness and accuracy for analysis. First, any missing values in attributes like health metrics or system parameters are identified and addressed using suitable imputation methods, such as mean, median, or mode imputation, based on the attribute's nature. The dataset is then normalized to standardize numerical features, such as energy levels, bandwidth, and RAM usage, ensuring consistency for model training. Categorical data, including device types or recorded metrics, are encoded using label encoding to facilitate integration with machine learning models. Outliers in attributes like energy consumption or bandwidth fluctuations are also detected and managed to prevent distortion in the analysis. Finally, the data is divided into training, validation, and testing sets to enable comprehensive model evaluation. These preprocessing steps ensure the data is wellprepared for advanced analyses, including routing optimization and energy efficiency evaluations in Fog-BAN-Cloud systems.

3.3
System Design

The system design integrates Q-Learning and Deep Extreme Learning Machine (DELM) to optimize resource allocation and enable high-speed data processing. It features two primary components: resource optimization for adaptive network management and data processing for efficient handling of large-scale IoT and Fog environments.

3.3.1
Q-LEARNING CONCEPTS

Reinforcement learning (RL) plays a critical role in machine learning systems, particularly in artificial intelligence applications. In the context of Body Area Networks for the Internet of Things (BAN-IoT), Q-learning, a specialized area of RL, is widely adopted. The core concept of RL is based on sequential decision-making, where an agent interacts with an environment, selects actions, and, based on the feedback from the environment, receives rewards that guide the decision-making process [25]. This cycle continues until the agent reaches a conclusion or maximizes its objective. Figure 2 illustrates the basic framework of reinforcement learning.

Figure 2:

Reinforcement learning framework

In RL, the agent's primary goal is to identify and perform the optimal actions to maximize the cumulative discounted reward over time. This process requires the agent to make informed decisions at each stage of the interaction, balancing immediate and long-term rewards. A policy, which represents the strategy the agent follows, defines the actions the agent will take in each state. The policy maps each state, represented as t, to a corresponding action. The primary aim of RL is to fine-tune the agent's policy such that the total long-term reward is maximized, essentially teaching the agent to make decisions that yield the most beneficial outcomes.

The effectiveness of RL largely hinges on the success of Q-learning, a model-free, off-policy algorithm that allows agents to learn from the environment without needing a predefined model of the environment. Q-learning is a leading algorithm in RL, known for its ability to approximate the value of state-action pairs by leveraging samples collected during interactions with the environment. The discrete-time Q-function, which is the key to understanding Q-learning, is mathematically represented in Equation (4).

In Q-learning, the decision-making process is formulated within the framework of a Markov Decision Process (MDP), which characterizes the environment's states, actions, rewards, and the associated transition probabilities. These elements are denoted as (S, A, P, R), where S represents the set of possible states, A is the set of actions, P is the probability distribution of transitions between states, and R is the reward function. Let z be the current state and z′ be the next state with action value a. 1Pzza=Prob{ zi+1=zzt=z,at=a }

The reward function for states z and z′ is given as Rztzt+1a.t . The overall reward function at a given state is the sum of the rewards for all possible next states, weighted by their transition probabilities, as shown in Equation (5). 2Rt=zt+1ZPstst+1atRztzt+1zt=z,at=aat

Let the utility function with a policy variable ω, Qω(z, a) be defined. 3Qω(z,a)={ Rt+γzt+1ϵZPzzt+1atQω(zt,a) }

γ →e reduction factor [0,1] & practically γ values primarily be [0.5,0.99]. 4Q*(z,a)=max{ Qω(z,a) }={ Rt+γzt+1ZPztzt+1atmax{ Qω(zt+1,a) } }zt=z,at=a={ Rt+γzt+1ZPztzt+1atQ*(zt+1,a) }zt=z,at=a

3.3.2
Deep Extreme Learning Machine

The Deep Extreme Learning Machine (DELM) extends the concept of the traditional Extreme Learning Machine (ELM) by incorporating deep learning principles, making it highly effective for complex data. DELM utilizes a deep architecture where multiple hidden layers are employed to extract hierarchical features from input data, enabling better performance in tasks such as classification and prediction. The key characteristic of DELM is the random initialization of weights in the hidden layers, similar to the original ELM, which avoids the need for backpropagation. This results in significantly faster training times compared to conventional deep learning models.

In DELM, as illustrated in Figure 3, the input data is passed through several hidden layers, each with randomly initialized weights. The output layer’s weights are computed analytically using a closed-form solution, based on the activations from the last hidden layer. The lack of backpropagation in DELM speeds up the training process, making it suitable for large-scale datasets and real-time applications. The model’s output is determined by applying a linear transformation to the activations of the hidden layers, and it minimizes the difference between the predicted and actual outputs through a least-squares solution.

Figure 3:

DELM Framework

One of the main advantages of DELM is its computational efficiency, as it significantly reduces the time required for training compared to traditional neural networks. However, the random initialization of weights can sometimes result in suboptimal performance, particularly in more complex tasks. Despite this, DELM’s ability to handle large datasets with minimal training time and without extensive manual tuning makes it a strong choice for many machine learning applications, including image classification, time series prediction, and anomaly detection.

3.3
Ensemble Combination of Q-Learning and Deep Extreme Learning Machine (DELM)

The combination of Q-Learning and Deep Extreme Learning Machine (DELM) aims to leverage the strengths of both reinforcement learning and deep learning to create a highly efficient and adaptive system. By integrating these two techniques, the system can optimize resource allocation, enhance decision-making, and process large-scale IoT and Fog environments efficiently.

The Q-Learning component, as discussed, forms the foundation of the system’s decision-making process. It utilizes a sequential decision-making framework where an agent interacts with the environment, taking actions based on feedback in the form of rewards. This interaction helps the agent learn a policy that maximizes the cumulative reward over time, ultimately guiding the agent to make optimal decisions. In the context of IoT and Fog environments, Q-Learning is used for adaptive network management, enabling the system to allocate resources dynamically based on real-time feedback and performance metrics.

However, as the complexity of the environment grows, Q-Learning faces challenges in processing high-dimensional input data, such as that generated by sensors in IoT systems. This is where Deep Extreme Learning Machine (DELM) comes into play. DELM extends traditional Extreme Learning Machines (ELM) by incorporating deep learning principles, allowing the system to efficiently handle large and complex datasets. The deep architecture of DELM, with multiple hidden layers, is capable of extracting hierarchical features from input data, which aids in better decision-making and prediction accuracy.

In the ensemble approach, DELM serves as a powerful feature extractor, providing the Q-Learning agent with high-quality, processed input data. By utilizing the learned features from DELM, Q-Learning can focus on optimizing its action policy with more meaningful and abstract representations of the environment. This combination of Q-Learning and DELM ensures that the system can make informed, efficient decisions in complex environments while also processing large-scale data rapidly.

For example, in Body Area Networks for IoT (BAN-IoT), DELM can process the sensor data from wearable devices, extracting relevant features such as user activity levels, environmental factors, and health metrics. Q-Learning then uses this extracted information to dynamically allocate resources, such as adjusting power consumption, network bandwidth, or prioritizing critical tasks, in real-time based on the agent’s experience and reward feedback.

This hybrid approach addresses the challenges posed by large-scale IoT networks, where real-time decision-making and efficient resource management are essential. The combination of Q-Learning’s reinforcement learning and DELM’s deep learning capabilities results in a system that can optimize both data processing and resource allocation, providing scalable solutions to complex problems in IoT and Fog computing environments.

4.
Results and Discussion

This section provides an evaluation of the proposed approach, focusing on key performance metrics and analyzing the effectiveness of the hybrid Q-Learning and DELM model in optimizing resource allocation and processing data in large-scale IoT and Fog environments.

4.1
Implementation Details

The proposed framework is implemented on an Intel Core i7 CPU with a 512 GB SSD, a 12 GB NVIDIA GPU, and 16 GB of RAM, operating at a frequency of 3.6 GHz. The system records datasets from IoT-BAN-Fog environments. A total of 90 nodes were deployed, interfacing with a variety of medical sensors, including ECG, pulse, and motion sensors. Table 1 presents the simulation parameters used for the experimentation. A total of 18,350 raw data points were recorded, encompassing eight attributes, and were used for training the network.

Table 1:

Parameters used for implementation

S.NoSimulation ParametersSpecifications
1Number of Nodes Deployed90
2Number of Fog Gateways04
3Initial Energy per BAN Node0.002 Joules
4Distance Variation Between BAN Nodes4-12 meters
5Equipped Transceivers5G
6Uplink Bandwidth250 Mbps
7Downlink Bandwidth125 Mbps
8RAM in Fog Gateways3GB
9Number of Recorded Attributes08
Table 2:

Performance measures employed in the evaluation

SL.NOPerformance MeasuresExpression
1AccuracyTP+TNTP+TN+FP+FN
2RecallTPTP+FN×100
3SpecificityTNTN+FP
4PrecisionTNTP+FP
5F1-Score2.Precison*RecallPrecision+Recall
4.2
Evaluation Metrics

The proposed algorithm was evaluated using multiple performance metrics, including accuracy, precision, recall, specificity, and F1-score. Accuracy measures the overall correctness of the model, while precision evaluates the proportion of true positive results among all positive predictions. Recall reflects the ability to correctly identify all relevant instances, and specificity measures the proportion of true negatives. F1-score provides a balanced measure of precision and recall. These metrics were assessed during the predictive performance analysis and compared with other existing deep learning-based Fog-BAN systems to demonstrate the effectiveness of the proposed approach. Additionally, latency and throughput were calculated as network-centric performance metrics and compared with other methods, highlighting the improvements in data transmission efficiency and response times.

TP & TN are True Positive & negative, FP & FN are False Positive& negative.

Predication case is defined in four parts: the first one is true positive (TP) in which the values are identified as true and, in reality, it was true also. The second one is false positive (FP) in which the values identified are false but are identified as true. The third one is false negative (FN) in which the value was true but was identified as negative. The fourth one is true negative (TN) in which the value was negative and was truly identified as negative.

4.3
Experimental Outcome

Table 3 compares the performance of various models in IoT-Fog environments based on metrics such as accuracy, sensitivity, specificity, precision, and F1-score. Among the models, RF-SVM and CNN-LSTM show moderate performance, with accuracies of 89.4% and 91.7%, respectively. Hybrid Fuzzy-Q-Learning and DRL-GRU demonstrate improved metrics, with DRL-GRU achieving 95.8% accuracy due to its advanced sequential data handling. The proposed model, an ensembled combination of Q-learning and Deep Extreme Learning Machine, achieves the highest accuracy of 98.6% and excels across all metrics, showcasing its superior ability to handle large-scale IoT-Fog environments efficiently while minimizing latency.

Table 3:

Comparison of Performance Metrics for different Models in IoT-Fog Environments

AlgorithmAccuracy (%)Sensitivity (%)Specificity (%)Precision (%)F1-Score (%)
RF-SVM89.488.987.888.288.5
CNN-LSTM91.791.290.190.691.0
Hybrid Fuzzy-Q-Learning93.593.192.392.793.0
DRL-GRU95.894.994.594.795.2
Proposed Model98.699.198.898.999.0

Figure 4 illustrates the comparative performance of different models in IoT-Fog environments based on key metrics. It highlights the substantial improvement of the proposed model in all evaluation criteria, including accuracy, sensitivity, specificity, precision, and F1-score, when compared to other models. Notably, the proposed model outperforms existing hybrid and deep learning architectures, such as DRL-GRU and Hybrid Fuzzy-Q-Learning, demonstrating its effectiveness in achieving higher accuracy and better overall performance in large-scale IoT and Fog network scenarios.

Figure 4:

Performance Evaluation of Various Models in IoT-Fog Environments

Figure 5 compares the latency performance of different learning-based Fog-BAN networks as the number of nodes increases. The proposed model outperforms others by maintaining lower latency, demonstrating its effectiveness in large-scale IoT and Fog environments. This highlights its ability to optimize resource allocation and ensure efficient data processing even with higher node counts.

Figure 5:

Latency Performance Comparison of Different Learning-Based Fog-BAN Networks with Varying Node Counts

5.
CONCLUSIONS

This research demonstrates the transformative potential of integrating Q-Learning with Deep Extreme Learning Machine (DELM) in optimizing large-scale IoT and Fog computing environments. The proposed framework successfully addresses critical challenges in network management, achieving exceptional performance metrics with 98.6% accuracy, 99.1% sensitivity, and a 99.0% F1-score, significantly outperforming traditional approaches like RF-SVM and CNN-LSTM. The integration of Q-Learning's adaptive decision-making capabilities with DELM's efficient data processing has proven particularly effective in reducing latency and optimizing resource allocation across distributed networks. The framework's success in handling real-time data processing while maintaining high performance metrics demonstrates its practical viability for diverse applications, particularly in healthcare monitoring systems where reliable, low-latency operation is crucial. Future enhancements could focus on improving security through blockchain integration, implementing smart energy management systems, expanding network scalability, and developing more sophisticated real-time analytics capabilities. Additional research could explore integrating advanced machine learning algorithms to enhance prediction accuracy and system performance. The framework could also be extended to support more diverse IoT applications and incorporate emerging technologies for better resource optimization.

Language: English
Page range: 106 - 119
Submitted on: Sep 6, 2024
Accepted on: Oct 15, 2024
Published on: Feb 24, 2025
Published by: Future Sciences For Digital Publishing
In partnership with: Paradigm Publishing Services
Publication frequency: 2 issues per year

© 2025 Sharan Kumar, Venkata Ramana Kaneti, Vandana Sharma, published by Future Sciences For Digital Publishing
This work is licensed under the Creative Commons Attribution 4.0 License.