The HALF framework: a privacy-preserving federated learning approach for scalable and secure AI applications

S. Akhilendranath; P. Senthilkumar

doi:10.2478/ijssis-2025-0058

Figures & Tables

Data flow in HALF framework: cloud-edge collaboration. HALF Framework, High-performance adaptive learning framework.

HALF Architecture. HALF, hybrid adaptive learning framework; IoT, Internet of things.

The HALF framework: enabling privacy-preserving distributed machine learning. HALF Framework, High-performance adaptive learning framework.

Literature survey on cloud services and distributed machine learning.

HALF framework. HALF Framework, High-performance adaptive learning framework.

HALF framework implementation process. HALF Framework, High-performance adaptive learning framework; IoT, Internet of things.

Overall HALF performance analysis. FL, federated learning.

Initial HALF performance analysis. FL, federated learning.

Round 20- HALF performance analysis. FL, federated learning.

Round 40- HALF performance analysis. FL, federated learning.

Round 60- HALF performance analysis. FL, federated learning.

Round 80- HALF performance analysis. FL, federated learning.

Final- HALF performance analysis. FL, federated learning.

Comparison of tradition FL & HALF. FL, federated learning.

Evaluation of the HALF framework success rates. HALF Framework, High-performance adaptive learning framework.

Comparison of HALF with existing FL approaches

Criteria	Traditional FL (FedAvg)	Edge-only FL	Cloud-based FL	Adaptive FL (recent works)	Proposed: HALF framework
Privacy mechanism	Basic DP or none	Limited, device-specific	Centralized control with encryption	DP + secure Aggregation (limited dynamic support)	Differential privacy (ε ≤ 2.3), HE, secure aggregation
Data distribution handling	Poor with non-IID data	Moderate	Poor	Moderate to good	Excellent (Dirichlet-based non-IID + adaptive aggregation)
Communication overhead	High (frequent, large updates)	Low to medium	High due to centralized model synchronization	Medium	Low (≥40% reduction using selective participation and compression)
Latency	High	Low	Medium to high	Medium	Low (edge-local pre-processing + fast routing protocols)
Device heterogeneity support	Poor	Moderate	Not scalable across diverse devices	Moderate	Strong (dynamic resource-aware device selection)
Scalability	Limited to small-to-medium networks	Limited due to edge constraints	Scalable in power but not privacy or bandwidth	Moderate	High (cloud-edge hybrid coordination and load balancing)
Model accuracy (non-IID data)	Degraded	Acceptable (with personalization)	Degraded	Improved (with advanced aggregation)	≥89.7% (optimized for skewed, non-IID conditions)
Energy efficiency	Inefficient	Energy-constrained devices	High cloud-side energy use	Moderate	Energy-aware (≤53.2% baseline consumption)
Security protocols	Minimal or basic encryption	Device-level security	Cloud-level encryption	Improved (some use TLS, AES)	End-to-end (TLS 1.3, AES-256, SHA-256 validation, privacy budget auditing)
Evaluation scope	Simulation-based, mostly MNIST	Limited deployment scenarios	Simulation-heavy	Some real-world benchmarks	Extensive: simulation + case studies + expert interviews
Adaptability to network conditions	Poor	Moderate	Poor	Good in some recent systems	Excellent (real-time network/resource monitoring & adaptation)
Overall performance summary	Inflexible, privacy-limited	Lightweight but narrow-scope	Powerful but privacy-risky	Evolving, partially adaptive	Balanced, privacy-resilient, scalable, and deployment-ready

Overview of the HALF framework—key insights and contributions

Category	HALF framework details
Framework name	HALF
Purpose	To provide a privacy-preserving, scalable, and resource-efficient FL architecture suitable for distributed AI systems.
Target domains	Healthcare, smart cities, autonomous systems
Core challenges addressed	Data confidentiality, communication overhead, device heterogeneity, on-IID data, latency and bandwidth constraints
Key features	Adaptive device selection, dynamic privacy budgeting (ε ≤ 2.3), secure multiparty computation, lightweight local training, hybrid edge-cloud synergy
Implementation workflow	1. Initialization, 2. Device selection, 3. Training, 4. Aggregation, 5. Evaluation
Privacy mechanisms	Differential privacy (DP-SGD), HE, secure aggregation, gaussian/laplace noise injection
Aggregation strategy	Weighted FedAvg with device reliability scoring and secure update verification (SHA-256)
Data partitioning	Non-IID via Dirichlet distribution (α = 0.5)
Hardware requirements	Edge devices: ≥2 GB RAM, 1.5 GHz CPU, cloud server: ≥16 GB RAM, 8-core CPU
Software stack	Python 3.8+, PyTorch, TensorFlow Privacy, Docker, Flower FL, OpenSSL
Training parameters	Epochs: 10, batch size: 32, rounds: 100, learning rate: 0.01
Performance metrics	Accuracy: ≥89.7%, communication overhead: reduced by ≥40%, training time: ≤142 min, energy consumption: ≤53.2% baseline
Evaluation methods	Quantitative: MNIST, CIFAR-10 simulations, qualitative: expert interviews, case studies
Security protocols	TLS 1.3, AES-256, secure device authentication, privacy budget auditing
Success metrics	Accuracy Gain versus FL: +7.4%, resource utilization reduction: 45.45% avg, implementation success rate: 91.3% avg
Limitations identified	Limited real-time streaming validation, potential scalability issues in cross-silo deployments, encryption overhead, heterogeneous device inclusion
Future directions	HE, real-time model drift adaptation cross-silo FL, automated privacy-risk scoring, fault-tolerance mechanisms

HALF framework: experimental parameters summary

Parameter	Value/configuration
Dataset	MNIST, CIFAR-10
Data distribution	Non-IID via Dirichlet (α = 0.5)
Number of clients	50 edge devices
Local Epochs	10
Global communication rounds	100
Batch size	32
Learning rate (η)	0.01
Gradient clipping norm	1
Privacy mechanism	Differential privacy (DP-SGD), ε ≤ 2.3
Aggregation method	Weighted FedAvg
Communication security	TLS 1.3, AES-256, SHA-256
Model accuracy target	≥89.7%
Maximum training time	≤142 min
Communication overhead reduction	≥40%
Hardware (edge)	≥2 GB RAM, 1.5 GHz CPU
Hardware (cloud)	≥16 GB RAM, 8-core CPU
Software stack	Python 3.8+, PyTorch, TensorFlow Privacy, Docker, Flower FL
Network environment	Emulated 5 G/Wi-Fi with 10 Mbps average speed
Evaluation metrics	Accuracy, cross-entropy loss, resource usage, privacy exposure
Privacy tools	HE (optional), noise injection (Laplace/Gaussian)

Meta-analysis table

Authors	Year	Key findings	Method used	Advantages	Disadvantages	Remarks
Smith et al.	2020	FL reduces data transfer but faces challenges with non-IID data.	FedAvg	Privacy preservation, reduced communication.	High latency, deficient performance with non-IID data.	Highlights the need for adaptive aggregation techniques.
Patel et al.	2020	Cloud-based FL improves scalability but raises privacy concerns.	Cloud-based FL, centralized aggregation	Scalable, handles large datasets efficiently.	Privacy risks due to centralized aggregation.	Recommends combining edge and cloud for privacy and scalability.
Johnson & Lee	2021	Edge computing reduces latency but lacks scalability for large datasets.	Edge-based FL with local processing	Low-latency processing, improved efficiency.	Limited computational power on edge devices.	Suggests integrating cloud resources for scalability.
Gupta et al.	2021	FL frameworks struggle with device heterogeneity and dynamic network conditions.	Heterogeneous FL, dynamic adaptation	Adapts to diverse devices and network conditions.	Complex implementation, high computational cost.	Calls for lightweight algorithms for resource-constrained devices.
Wang et al.	2022	HALF framework improves communication efficiency by 40%.	Adaptive aggregation, hybrid FL	Reduced communication overhead handles non-IID data effectively.	Requires dynamic optimization for heterogeneous devices.	Demonstrates the potential of HALF in real-world applications.
Li et al.	2022	Edge devices improve real-time processing but face bandwidth limitations.	Edge-based FL, real-time optimization	Low-latency, real-time decision-making.	Limited bandwidth for communication with the cloud.	Suggests optimizing communication protocols for edge devices.
Zhang et al.	2023	HALF framework achieves high model accuracy in healthcare applications.	Adaptive FL, non-IID data handling	High accuracy, privacy-preserving, suitable for sensitive data.	Requires extensive real-world validation.	Highlights the societal impact of HALF in healthcare.
Chen et al.	2023	Differential privacy enhances data security in FL.	Differential privacy, secure aggregation	Strong privacy guarantees, robust against data breaches.	Slight reduction in model accuracy due to noise addition.	Recommends balancing privacy and accuracy in FL frameworks.
Kumar & Singh	2024	Cloud-edge collaboration improves scalability and resource management.	Hybrid FL with cloud-edge integration	Scalable, efficient resource allocation handles large datasets.	High dependency on cloud infrastructure, potential latency issues.	Proposes dynamic resource allocation algorithms for optimization.
Nguyen et al.	2024	HALF framework reduces latency in autonomous systems.	Hybrid FL, low-latency optimization	Suitable for real-time applications, improves system responsiveness.	Requires real-world testing in dynamic environments.	Highlights the potential of HALF in autonomous systems.

Comparative analysis of HALF versus existing FL frameworks

Dimension	Existing literature (key insights)	HALF framework contributions	Implications of HALF’s results
Privacy preservation	Use of differential privacy and secure multiparty computation (Chen et al., 2023; Singh & Sharma, 2024). Often causes trade-offs in accuracy.	Integrates DP-SGD (ε ≤ 2.3), HE, and dynamic noise calibration based on data sensitivity.	Achieves strong privacy guarantees while preserving model accuracy (≥89.7%), making it suitable for regulated sectors like healthcare and finance.
Communication overhead	Traditional FL has high overhead due to frequent model updates (Smith et al., 2020). Some studies reduce overhead with adaptive aggregation (Wang et al., 2022).	Employs weighted FedAvg with dynamic device selection and compression protocols to reduce data exchange.	Demonstrates ≥40% reduction in communication overhead, allowing deployment in low-bandwidth and remote IoT environments.
Scalability and resource efficiency	Cloud-only or edge-only systems face bottlenecks (Patel et al., 2020; Johnson & Lee, 2021). Edge-only models lack power; cloud-only raise privacy risks.	Combines cloud scalability with edge autonomy, using adaptive resource management and energy-aware device selection.	Enables real-time FL with ≤53.2% energy usage, promoting sustainable large-scale deployment across IoT and smart cities.
Non-IID data handling	Many FL systems degrade in non-IID settings; only a few apply personalized models or adaptive techniques (Gupta et al., 2021; Zhang et al., 2023).	Simulates non-IID via Dirichlet (α = 0.5) and applies adaptive aggregation + device weighting for better personalization.	Maintains high model accuracy across all rounds, especially in heterogeneous settings like personalized healthcare and sensor networks.
Latency and responsiveness	Edge-based systems reduce latency but struggle with model coordination (Li et al., 2022; Nguyen et al., 2024).	Uses localized training, priority-based task queues, and fast edge-to-cloud routing (e.g., MQTT, HTTP/2).	HALF achieved ≤142 min training time, enabling real-time inference in latency-sensitive domains like autonomous vehicles and emergency response systems.
Adaptability to dynamic conditions	Few FL frameworks account for fluctuating resources or network conditions (Gupta et al., 2021).	Dynamic optimization algorithms assess device CPU, memory, battery, and bandwidth before participation.	System is resilient to edge instability and capable of handling heterogeneous environments, improving fault tolerance and system continuity.
Real-world application validation	Many frameworks lack real-world evaluation or cross-sector validation (Zhang et al., 2023; Nguyen et al., 2024).	Includes case studies in healthcare, smart cities, and autonomous systems, plus expert interviews and penetration tests.	Validates HALF’s societal impact, compliance with GDPR/HIPAA, and readiness for deployment in mission-critical applications.
Performance metrics	Performance often suffers with added privacy constraints. Few studies report full-spectrum metrics including resource use (Chen et al., 2023).	Measures accuracy, latency, privacy budget, energy use, memory use, and communication load.	Achieves 91.3% average success rate across domains, confirming that privacy and performance can co-exist without compromising scalability or user utility.

Enhanced meta-analysis of literature on HALF framework

Authors	Key findings	Method used	Advantages	Disadvantages	Limitations of study
Smith et al.	FL reduces data transfer but faces challenges with non-IID data.	FedAvg	Privacy preservation, reduced communication.	High latency, poor performance with non-IID data.	Focuses on theoretical aspects without practical implementation.
Patel et al.	Cloud-based FL improves scalability but raises privacy concerns.	Cloud-based FL, centralized aggregation	Scalable, handles large datasets.	Privacy risks due to centralized data handling.	Lacks integration with edge computing; minimal attention to latency.
Johnson & Lee	Edge computing reduces latency but lacks scalability.	Edge-based FL with local processing	Low latency, improved real-time response.	Limited edge device resources.	Not suitable for large-scale implementations across distributed networks.
Gupta et al.	FL frameworks struggle with device heterogeneity and dynamic networks.	Heterogeneous FL, dynamic adaptation	Adaptable to varying devices and network conditions.	Complex to implement; high computing demands.	No benchmarks for real-time performance under constrained environments.
Wang et al.	HALF improves communication efficiency by 40%.	Adaptive aggregation, hybrid FL	Reduces communication load, supports non-IID data.	Needs dynamic optimization across edge-cloud layers.	Real-world deployment scenarios and performance metrics are limited.
Li et al.	Edge devices support real-time data processing but face bandwidth issues.	Edge-based FL, real-time optimization	Low-latency processing and decisions.	Bandwidth constraints affect cloud sync.	Lacks analysis of variable communication loads in high-frequency environments.
Zhang et al.	HALF achieves high accuracy in healthcare.	Adaptive FL, non-IID data handling	High model accuracy; suitable for sensitive applications.	Requires extensive testing on diverse datasets.	Healthcare-specific scope; lacks validation in other domains like smart cities or transport.
Chen et al.	Differential privacy enhances FL security.	Differential privacy, secure aggregation	Strong privacy guarantees.	Slight reduction in model accuracy.	Lacks evaluation of cumulative privacy impact in dynamic FL updates.
Kumar & Singh	Cloud-edge FL improves scalability and resource management.	Hybrid FL with cloud-edge integration	Efficient data handling and dynamic resource allocation.	Latency issues due to cloud dependency.	No real-time simulation of performance trade-offs.
Nguyen et al.	HALF reduces latency in autonomous systems.	Hybrid FL, low-latency optimization	Real-time responsiveness in dynamic systems.	Limited testing in variable traffic and network conditions.	Scenario-limited evaluation; needs broader testing in multi-agent environments.

The HALF framework: a privacy-preserving federated learning approach for scalable and secure AI applications

Figures & Tables

Figure 1:

Figure 2:

Figure 3:

Figure 4:

Figure 5:

Figure 6:

Figure 7:

Figure 8:

Figure 9:

Figure 10:

Figure 11:

Figure 12:

Figure 13:

Figure 14:

Figure 15:

Comparison of HALF with existing FL approaches

Overview of the HALF framework—key insights and contributions

HALF framework: experimental parameters summary

Meta-analysis table

Comparative analysis of HALF versus existing FL frameworks

Enhanced meta-analysis of literature on HALF framework

Paradigm

My account