A Holistic review and performance evaluation of unsupervised learning methods for network anomaly detection

Niharika Sharma; Bhavna Arora; Shabana Ziyad; Pradeep Kumar Singh; Yashwant Singh

doi:10.2478/ijssis-2024-0016

Figures & Tables

Bibliometric analysis of AD literature (A) Chronological distribution of bibliometric papers (B) Percentage distribution of publication types. AD, anomaly detection.

Taxonomy of 13 selected UL methods for anomaly detection from 6 families. UL, unsupervised learning.

Framework of NAD based on UL [45]. NAD, network anomaly detection; UL, unsupervised learning.

Categorization of UL techniques [23]. UL, unsupervised learning.

Comparison of UL methods on the NSL-KDD dataset. UL, unsupervised learning.

Comparison of UL methods on the UNSW-NB15 dataset. UL, unsupervised learning.

Comparison of UL methods on the CIC-IDS2017 dataset. IDS, intrusion detection systems; UL, unsupervised learning.

A relative comparison of this paper with existing surveys/review papers_

References	Discussion	Focus	1	2	3	4	5	6
[15]	Focus on literature review and performance evaluation of UL methods.	IDS	≈	×	×	×	✓	×
[16]	It provides a comparative analysis of various unsupervised NAD approaches and investigates standard metrics suitable for many connections.	AD in networks	≈	✓	×	×	✓	×
[17]	Presents a detailed review of different clustering methods and their strengths and weaknesses.	_	≈	✓	×	×	×	×
[18]	Presents a framework for evaluating anomaly detection methods for HTTP.	HTTP AD	≈	≈	×	×	✓	×
[19]	An overview of ML techniques for intrusion detection between 2000 and 2007 is presented.	IDS	✓	×	×	≈	×	×
[20]	It provides an overview and comparison of various clustering techniques, pros, and cons.	AD	≈	≈	×	×	×	×
[21]	Performance evaluation of four novelty detection algorithms is carried out.	Novelty detection	✓	✓	×	×	✓	✓
[22]	Presents a detailed review of popular traditional and modern clustering approaches.	_	≈	≈	×	×	×	×
[1]	Nineteen widely employed unsupervised techniques are evaluated to provide an understanding of their performance in various domains.	AD	≈	✓	×	×	✓	×
[2]	The proposed work focuses on a comparative evaluation of four UL algorithms.	Smart city wireless networks	≈	✓	×	×	✓	×
[3]	A survey on ML techniques focusing on unsupervised and hybrid IDS is presented.	IDS	✓	✓	×	✓	×	✓
[4]	The proposed work aims to provide an experimental comparison of unsupervised methods employed for NAD against five intrusion detection datasets.	Anomaly-based IDS	≈	✓	×	✓	✓	×
[23]	A comprehensive survey of UL techniques in networking is presented.	Anomaly-based in networking	≈	✓	✓	×	×	✓
[5]	Review and compare ML algorithms for IDS.	IDS	✓	≈	×	×	✓	×
[6]	Presents a detailed review of state-of-the-art IDS methods, datasets, performance measures, and research challenges.	IDS	✓	≈	×	✓	✓	✓
[8]	An in-depth review of state-of-the-art clustering methods is presented.	-	≈	✓	✓	×	×	✓
[9]	A comprehensive overview of UL techniques for AD is presented, emphasizing their utility in scenarios with scarce labeled data.	AD in industrial applications	✓	✓	✓	×	×	✓
[11]	Review and compare outlier detection techniques from seven different families.	Outlier detection	≈	≈	×	×	✓	×
[12]	AD techniques are explored comprehensively to address the detection of emerging threats.	IoT and sensor data	✓	≈	✓	×	×	✓
[13]	In-depth review of various unsupervised and semisupervised clustering methods.	-	✓	✓	≈	×	✓	×
[14]	Focus on log file analysis for early incident detection, particularly emphasizing self-learning AD techniques.	AD	✓	≈	✓	×	×	×
Our survey	The primary focus is on studying UL NAD methods while considering recent advances.	NAD	✓	✓	✓	✓	✓	✓

Relative comparison of UL approaches for network anomaly/intrusion detection_

Authors	Methodology	Algorithm	Dataset	Input data	DoS	Probe	R2L	U2R	Real-time detection
[83]	Unsupervised approach for outlier detection using subspace clustering and evidence accumulation.	DBSCAN	METROSEC	Continuous	95	95	85	85	✓
[84]	Tree-based subspace clustering approach for high-dimensional datasets; includes cluster stability analysis.	Tree-based subspace clustering (TCLUS)	KDDCUP99, TUIDS	Mixed	99	96	86	66
[85]	Multiclustering scheme incorporating subspace clustering and evidence accumulation to overcome knowledge-based approach limitations.	DBSCAN	KDDCUP99, METROSEC	N/A	-	-	-	-	✓
[86]	Particle swarm optimization clustering strategy based on map-reduce methodology for parallelization in large-scale networks.	PSO clustering	KDDCUP99	Mixed	Max AUC: 96.3
[87]	Novel strategy for automatic tuning and optimization of detection model parameters in a real network environment.	Clustering and one-class SVM	KDDCUP99 and Kyoto University	Continuous	-	-	-	-	✓
[88]	K-means clustering to generate training subsets; neuro-fuzzy and radial-basis SVM for classification.	K-means, SVM, neuro-fuzzy neural network	KDDCUP99	N/A	98.8	97.31	97.5	97.5
[89]	Innate-immune strategy via UL for categorizing normal and abnormal profiles.	DBSCAN	KDDCUP99	N/A	FPR: 0.008, TNR: 0.991, ACC: 77.1, Recall: 0.589				✓
[49]	Novel approach using cluster centers and NNs to transform the dataset into a one-dimensional feature set.	Clustering, KNN	KDDCUP99	N/A	99.68	87.61	3.85	57.02
[90]	Nature-inspired meta-heuristic approach to optimize the optimum path forest clustering algorithm.	Optimum path forest clustering	ISCX, KDDCUP99, NSL-KDD	N/A	Purity measure: ISCX: 96.3, KDDCUP99: 71.66, NSL-KDD: 99.8
[91]	UL technique for real-time detection of fast and complex zero-day attacks.	DBSCAN	DARPA, ISCX	N/A	ACC: 98.39%, Recall: 100%, Precision: 98.12%, FP: 3.61				✓
[92]	Modified optimum path forest algorithm to enhance IDS performance, particularly in detecting less frequent attacks.	K-means, modified optimum path forest	NSL-KDD	N/A	96.89	85.92	77.98	81.13
[93]	Hierarchical agglomerative clustering applied to SOM network to lower computational cost and sensitivity.	Hierarchical agglomerative clustering, SOM	NSL-KDD	Mixed	DR: 96.66%, ACC: 83.46, Precision: 75, FPR: 0.279, FNR: 0.033

Comparative analysis of network datasets_

Dataset	Data-source	Year	1	2	3	4	5	6	7
DARPA (1998)	MIT Lincoln Laboratory	1998	Emulated	Benchmark	✓	7,000,000	✓	DoS, Probe, U2R, R2L	41
KDD CUP 99	University of California	1999	Emulated	Benchmark	✓	4,900,000	✓	DoS, Probe, U2R, R2L	41
DEFCON	N/A	2000	Real	Benchmark	N/A	N/A	✓	N/A	Flag traces
LBNL	Lawrence Berkeley National Laboratory	2004	N/A	Benchmark	✓	>100 hr	×	Malicious traces	Internet traces
Kyoto	Song et al. (2006) [114]	2006	Real	Benchmark	✓	N/A	✓	Normal and attack sessions	24
CAIDA	Jonker et al. (2017) [118]	2008–2017	Real	Benchmark	✓	Huge	×	DDoS	20
CDX	United States Military Academy	2009	Real	Real-life	N/A	5771	✓	Buffer overflow	5
NSL-KDD	Tavallaee et al. (2009) [126]	2009	Emulated	Benchmark	✓	148,517	✓	DoS, Probe, U2R, R2L	41
ISCX 2012	Shiravi et al. (2012) [117]	2012	Real	Real-life	✓	2,450,324	✓	DoS, DdoS, Bruteforce, Infiltration	IP flows
UNSW-NB15	Moustafa and Slay (2015) [119]	2015	Emulated	Benchmark	✓	2,540,044	✓	Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode, Worms	49
CIDDS-001	Ring et al. (2017) [120]	2017	Emulated	Benchmark	✓	31,959,267	✓	Ping scanning, Port scanning, Brute force, and DoS	14
CIDDS-002	Ring et al. (2017) [120]	2017	Emulated	Benchmark	✓	16,161,183	✓	Ping scanning, Port scanning, Brute force, and DoS	14
CIC-IDS2017	Sharafaldin et al. (2018) [121]	2017	Emulated	Benchmark	✓	2,830,743	✓	DdoS, Dos, Botnet, BruteForce, Infiltration, WebAttack, Port scan	80
CSE-CICIDS2018	Bharati and Tamane (2020) [122]	2018	Emulated	Benchmark	✓	16,232,943	✓	DdoS, Dos Botnet, BruteForce, Infiltration, WebAttack, Port scan	80
CICDDoS 2019	Sharafaldin et al. (2019) [123]	2019	Emulated	Benchmark	✓	32,925	✓	DdoS_DNS, DdoS_LDAP, DdoS_MSSQL, DdoS_NetBIOS, DdoS_NTP, DdoS_ SNMP, DdoS_SSDP, DdoS_SYN, DdoS_ TFTP, DdoS_UDP, DdoS_UDP-Lag, DdoS_WebDDoS	76
BoT-IoT	Koroniotis et al. (2019) [124]	2019	Real	Real-life	✓	73,360,900	✓	DoS, DdoS, Reconnaissance, Theft	29
IoT-23	N/A	2020	Real	Real-life	✓	N/A	✓	Mirai, Torii, Hide and Seek, Muhstik, Hakai, IRCBot, Hajime, Trojan, Kenjiro, Okiru, Gagfyt	21

Recent developments in unsupervised-based network intrusion/anomaly detection_

Authors/year	Methodology	Algorithm/techniques used	Datasets	Attack class	Metrics (%)	Limitations/future work
Amoli et al. (2015) [91]	Real-time intrusion detection using adaptive thresholds.	DBSCAN	DARPA, ISCX	DoS, DdoS, POD, SMURF, Mail-bomb, botnet, port scanning-port sweep	Precision: 98.12, ACC: 98.39, FPR: 3.61	Future work: Detecting complex attacks, distinguishing flash crowds and DdoS for better clarity.
Zhang et al. (2016) [42]	Utilizing One-class SVM for network intrusion identification.	One-class SVM	NSL-KDD	DoS, Probe, U2R, R2L	Precision: 99.3, Recall: 91.61, F-value: 95.18	Low DR for minority-class attacks like U2R and R2L.
Landress (2016) [61]	Employing K-means clustering, J48, decision tree, and SOM to reduce false positives.	K-means, J48, decision tree, self-organizing map	KDDCUP99	DoS, Probe, U2R, R2L	ACC: 98.92	Limitation: Increased computational complexity with SOM; explore efficient hybrid techniques for real-time processing.
He et al. (2017) [94]	Exploiting SDN for effective anomaly detection.	DBSCAN	KDDCUP99	DoS, Probe, U2R, R2L	ACC: 94.5	Future work: Focus on real-time packet clustering for timely detection.
Ariafar and Kiani (2017) [95]	Optimizing clusters using GA, K-means, and decision trees.	K-means, decision tree, GA	NSL-KDD	DoS, Probe, U2R, R2L	DR: 99.1, FAR: 1.8	Evaluate on modern-day datasets like CIC-IDS 201 and CSE-CICIDS2018.
Bigdeli et al. (2018) [96]	Incremental cluster updates with spectral and density-based clustering.	Spectral and density-based clustering	KDDCUP99, NSL-KDD, Darpa98, IUSTsip, DataSetMe	DoS, Probe, U2R, R2L	DR: 94%, FAR: 4%	Introduce concept drift while merging clusters.
Almi'Ani et al. (2018) [97]	Hierarchical agglomerative clustering using k-means for reduced training time.	Hierarchical agglomerative clustering	NSL-KDD	DoS, Probe, U2R, R2L	DR: 96.66, ACC: 83.46, Precision: 75, FPR: 0.279, FNR: 0.033	Low accuracy due to low sensitivity toward normal behavior; investigate fuzzy C-means clustering for accuracy enhancement.
Zhou et al. (2019) [98]	Hybrid technique using KPCA and ELM.	KPCA, ELM	KDDCUP99	DoS, Probe, U2R, R2L	DR: DoS: 98.96, Probe: 98.54, R2L: 94.72, U2R: 36.54, Acc: 98.18, FAR: 2.38	The DR of U2R is quite low. To optimize the results of ELM, evolutionary algorithms need to be applied.
Choi et al. (2019) [99]	Extracting key features using AE.	AE	NSL-KDD	DoS, Probe, U2R, R2L	ACC: 91.70	Improve U2R DR; apply evolutionary algorithms to optimize ELM results.
Aliakbarisani et al. (2019) [74]	Formulates a constraint trace ratio optimization problem for Laplacian Eigenmap strategy.	Constraint trace optimization, PCA, LDA	NSL-KDD, Kyoto 2006+	DoS, Probe, U2R, R2L	ACC: 97.84, F-score: 0.878, FPR: 0.001	Train and test with different datasets; explore ensemble strategies based on UL models for enhanced performance.
Paulauskas and Baskys (2019) [72]	Employing HBOS mechanism for identifying rare attacks like U2R and R2L.	HBOS	NSL-KDD	DoS, Probe, U2R, R2L	F-measure: 87	Investigate online learning metrics for newly discovered attacks.
Hwang et al. (2020) [100]	CNN and AE for extracting raw features from network traffic.	CNN, AE	USTC-TFC 2016, Mirai-RGU, Mirai-CCU	SYN flood, UDP flood, ACK flood, and HTTP flood, Mirai C&C	ACC: 99.77, Precision: 99.93, Recall: 99.17, F1 measure: 99.55, FNR: 0.02, FPR: 0.83	Improve performance rate of majority classes.
Zavrak and Iskefiyeli (2020) [101]	Deploys flow-based IDS with One-class SVM as anomaly detector.	AE, VAE, One-Class SVM	CIC-IDS2017	DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heart-bleed	AUC: VAE: 75.96, AE: 73.98, One-Class SVM: 66.36	Consider flow-based attributes collected at specified time intervals to improve DR and reduce FAR.
Truong-Huu et al. (2020) [79]	Uses GAN strategy to extract useful features and proposes a traffic aggregation technique.	GAN	UNSW-NB15, CIC-IDS2017	Backdoors, Exploits, Worms, Shellcode, generic, DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heart-bleed	UNSW-NB15: Precision: 0.84, Recall: 0.85, F1-score: 0.85, AUPRC: 0.8831 CIC-IDS2017: Prec: 0.8260, Recall: 0.8268, F1-score: 0.8264, AUROC: 0.9529, AUPRC: 0.8271	Investigate multi-class classification approach for identifying different types of attacks.
Prasad et al. (2020) [56]	Proposes a novel cluster center initialization approach to overcome shortcomings of conventional clustering techniques.	K-means clustering, cluster center initialization	CIC-IDS2017	DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heart-bleed	DR: 88, FR: 88.5, Precision: 88, F-measure: 0.531, ACC: 88.6	Address limitations like manual pre-processing and time/ space complexity in MANET deployment.
Megantara and Ahmad (2021) [102]	Utilizes feature importance and data-reduction techniques to improve the prediction performance of NAD.	Decision tree, LOF	NSL-KDD, UNSW-NB15	DoS, Probe, U2R, R2L Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode, worms	NSL-KDD: Acc: 99.73 UNSW-NB15: 91.86	The size of the LOF cluster can be optimized and the threshold value for handling outliers can be improved.
Liao et al. (2021) [103]	Presents an ensemble of UL schemes based on a novel weighting scheme.	AE, GANs	UNSW-NB15, CIC-IDS 2017	Backdoors, Exploits, Worms, Shellcode, generic, DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heart-bleed	UNSW-NB15: Precision: 97.9, Recall: 92.4, F1-score: 94.9 CIC-IDS2017: Precision: 83.8, Recall: 84, F1-score: 83.5	Address suboptimal results for specific attack families, explore optimization for better performance.
Verkerken et al. (2022) [71]	Proposes an inter-dataset evaluation approach for ensemble UL algorithms.	PCA, IF, AE, One-class SVM	CIC-IDS 2017, CSE-IDS 2017	DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heartbleed	ACC: PCA: 90.9, IF: 91.1, AE: 99.9, One-class SVM: 98.9 AUROC: PCA: 93.73, IF: 95.8, AE: 97.75, One-class SVM: 94.20	Consider employing supervised learning approaches for generalization strength validation.
Singh and Jang-Jaccard (2022) [104]	Proposes a unified AE architecture based on CNN and LSTM to examine spatiotemporal correlations in traffic data.	MSCNN, LSTM-based AE	NSL-KDD, UNSW-NB15, CICDDoS2019	Normal, Attack (Binary class)	ACC: 97.10, Precision: 95.9, Recall: 96.4, F-score: 96.0 (average)	Fine-tune hyperparameters for enhanced model performance using optimization techniques.
Wang et al. (2022) [105]	Proposes an ensemble of UL algorithms to address challenges in processing overhead and poor detection performance for unseen threats.	AE, IF	CES-CICIDS2018, MQTT-IOT-IDS2020	Sparta SSH brute force, DoS attacks-SlowHTTPTest, DoS-Hulk, DdoS attack-LOIC-UDP, DdoS attacks-LOIC-HTTP, Brute force-XSS, Brute force-web, FTP-brute force	Average ACC: 96.43, Average Recall: 95.95, Average F-score: 96.02	Improve DR of specific attacks like “DoS-Hulk” in the proposed work.
De C. Bertoli et al. (2022) [106]	Presents an NIDS for generalized detection performance in heterogeneous networks.	Deep AE, FL, Energy Flow Classifier	UNSW-NB15, CSE-CICIDS2018, BoT-IoT, ToN-IoT	Normal, Attack (Binary class)	BoT-IoT: ACC: 93, Recall: 93, Precision: 99, F-score: 95 ToN-IoT: ACC: 85, Recall: 85, Precision: 87, F-score: 77, UNSWNB15: ACC: 97, Recall: 99, Precision: 57, F-score: 73 CSE-CICIDS2018: ACC: 98, Recall: 88, Precision: 92, F-score: 90	Deploy the proposed approach in distributed NIDS for robustness evaluation.
Eren et al. (2023) [107]	Proposes a novel tensor decomposition method to enhance the detection of unseen attacks in network anomaly and intrusion detection, leveraging tensor factorization for improved generalization and identification of evolving threats.	Unsupervised DL, Tensor factorization algorithm	Neris-botnet, Spam e-mail	-	ROC-AUC, PR-AUC Average ROC-AUC: 0.9661, Average PR-AUC: 0.9152	Using a tensor decomposition algorithm with latent factors enables enhanced initialization of the tensor factorization algorithm for improved performance.
Lan et al. (2023) [108]	Presents a novel framework for unsupervised intrusion detection that transfers knowledge from known attacks to detect new ones using a hierarchical attention-based triplet network and unsupervised domain adaptation.	Attention network, unsupervised domain adaption	ISCX-2012, UNSW-NB15, CIC-IDS2017, CTU-13	HTTP DoS, Brute Force SSH, PortScan, Brute Force, Dos slow loris, Web attack XSS, Neris, Rbot, Fuzzers, Generic, Shellcode, Exploits	Average ACC: 96.38, Average DR: 94.16	Utilize tensor decomposition algorithm with latent factors for enhanced initialization and performance.
Boppana and Bagade (2023) [109]	Synergy of GANs and AE for unsupervised intrusion detection in MQTT networks.	AE, GAN	MQTT-IoT-IDS2020	Normal, Attack (Binary class)	F1-score: 97	Evaluate generalizability to diverse network architectures in future work.

Results of 13 selected UL methods for NAD on the CIC-IDS2017 dataset_

Algorithms	Family	Accuracy	Precision	Recall	F1-score
K-NN	Distance-based	96.4	96.7	96.0	97.0
ODIN	Distance-based	97.6	97.4	97.5	98.0
LOF	Density-based	96	95.3	93.6	94.8
COF	Density-based	93.3	93.2	95	93.7
K-means	Clustering-based	90.2	87.4	88.6	89.5
DBSCAN	Clustering-based	93.7	92.7	90.6	92.3
EM	Clustering-based	94.3	93.2	91.2	92.4
PCA	Dimensionality-reduction	96.3	95.4	94.9	95.1
KPCA	Dimensionality-reduction	97.3	97.2	96.9	97
ICA	Dimensionality-reduction	96.3	96.2	96	96.3
HBOS	Statistical-based	95.9	94.9	93.5	93.0
One-class SVM	Classification-based	98.3	98.1	99	98.4
IF	Classification-based	99.6	99.5	99.2	99.4

Hardware and software specifications_

Components	Specifications
Operating system	Windows 11
System type	64-bit operating system, x 64-based Processor
Processor	11^th Gen Intel ® Core (TM)
	i5-1135G7 @ 2.40 GHz
	2.42 GHz
RAM	8 GB
Python	3.7
Datasets	NSL-KDD, UNSW-NB15, CICIDS2017

Results of 13 selected UL methods for NAD on the NSL-KDD dataset_

Algorithms	Family	Accuracy	Precision	Recall	F1-score
K-NN	Neighbor-based	98.4	98.0	97.9	98.5
ODIN	Neighbor-based	99.4	99.0	98.6	98.8
LOF	Density-based	97.0	96.8	95.3	95.6
COF	Density-based	94.0	94.5	96.1	95.1
K-means	Clustering-based	94	92.46	92.3	93
DBSCAN	Clustering-based	94.5	94.0	92.9	94.0
EM	Clustering-based	95.0	93	92.9	94.0
PCA	Dimensionality-reduction	97.8	97.0	96.7	97.0
KPCA	Dimensionality-reduction	99.2	99.0	98.9	98.6
ICA	Dimensionality-reduction	98.0	97.5	97.8	98.1
HBOS	Statistical-based	94.4	93.9	95.8	96.0
One-class SVM	Classification-based	99.4	99.2	99.2	99.5
IF	Classification-based	99.9	99.8	99.6	99.7

Results of 13 selected UL methods for NAD on the UNSW-NB15 dataset_

Algorithms	Family	Accuracy	Precision	Recall	F1-score
K-NN	Distance-based	98.2	97.8	97.6	97.7
ODIN	Distance-based	99.2	98.9	98.5	98.6
LOF	Density-based	96.9	95.8	95.1	95.0
COF	Density-based	93.8	94.2	95.9	94.7
K-means	Clustering-based	93.7	92.0	92.1	92.9
DBSCAN	Clustering-based	94.2	93.8	92.7	93.5
EM	Clustering-based	95.5	92.6	92.3	93.4
PCA	Dimensionality-reduction	97.2	96.9	97.3	97.1
KPCA	Dimensionality-reduction	99.1	98.5	98.5	98.7
ICA	Dimensionality-reduction	97.7	97.4	97.1	97.3
HBOS	Statistical-based	92.3	91.9	94.6	94.8
One-class SVM	Classification-based	99.2	99.0	99.2	99.4
IF	Classification-based	99.7	99.5	99.3	99.6

A Holistic review and performance evaluation of unsupervised learning methods for network anomaly detection

Figures & Tables

Figure 1:

Figure 2:

Figure 3:

Figure 4:

Figure 5:

Figure 6:

Figure 7:

Figure 8:

A relative comparison of this paper with existing surveys/review papers_

Relative comparison of UL approaches for network anomaly/intrusion detection_

Comparative analysis of network datasets_

Recent developments in unsupervised-based network intrusion/anomaly detection_

Results of 13 selected UL methods for NAD on the CIC-IDS2017 dataset_

Hardware and software specifications_

Results of 13 selected UL methods for NAD on the NSL-KDD dataset_

Results of 13 selected UL methods for NAD on the UNSW-NB15 dataset_

Paradigm

My account