Have a personal or library account? Click to login
Data Sharing and Use in Cybersecurity Research Cover

Data Sharing and Use in Cybersecurity Research

By: Inna Kouper and  Stacy Stone  
Open Access
|Jan 2024

Figures & Tables

Table 1

Venues of Sampled Publications.

PUBLICATION VENUECOUNTPERCENT
ACM Computer and Communications Security Conference (CCS)4023%
USENIX Security Symposium4023%
IEEE Symposium on Security and Privacy (SP)2112%
Network and Distributed System Security Symposium (NDSS)1710%
ArXiv85%
Other (journals, conferences)4526%
Total171100%
Table 2

Number of Authors in Publications.

TOTAL NUMBER OF AUTHORSNUMBER OF PAPERSPERCENT
22012%
33923%
43319%
52112%
62213%
7138%
8127%
942%
1042%
1121%
1211%
Mean # authors per paper4.81
Standard deviation2.21
Table 3

Positions of First Authors in Publications.

FIRST AUTHOR POSITIONNUMBER OF PAPERSPERCENT
Graduate student (PhD)12875%
Faculty1811%
Postdoctoral researcher85%
Graduate student (MS)85%
Other95%
Total171100%
Table 4

First Author Positions by Gender.

FIRST AUTHOR POSITION GENDERFEMALEMALE
Graduate student (MS or PhD)18 (13%)118 (87%)
Faculty3 (17%)15 (83%)
Postdoc3 (37%)5 (63%)
Other09 (100%)
Table 5

Primary Types of Analysis in Publications.

TYPE OF ANALYSISFREQUENCYPERCENT
Prototype and evaluation8147%
Algorithm development and testing4225%
Vulnerability analysis138%
Conceptual model127%
Machine learning application85%
Statistical analysis85%
Other74%
Total171100%
Table 6

Number of Datasets in Each Paper (Ndatasets = 387).

DATASETS IN EACH PAPERNUMBER OF PAPERSPERCENT
16116%
25414%
39324%
4369%
55514%
6 or more8823%
Mean2.6
Table 7

Origin of the Datasets Used in Cybersecurity Research.

DATA ORIGINNUMBER OF DATASETSPERCENT
Existing21155%
Collected10527%
Simulated174%
Synthetic103%
Other4411%
Total387100%
dsj-23-1600-g1.png
Figure 1

Public Availability of Datasets in The Sample.

dsj-23-1600-g2.png
Figure 2

Public Availability of Computing and Analytical Tools for Data Processing.

Table 8

Types of Data Availability in Publications.

TYPE OF DATA AVAILABILITYNUMBER OF DATASETSPERCENT
URL to a repository4728%
Citation, no URL4024%
No URL or citation4024%
URL to dataset2113%
Broken link1610%
DOI11%
Total165100%
Table 9

Availability of the Previously Existing Datasets.

EXISTING DATASETSDATASETS IN PUBLICATIONSPERCENT
Public11956%
Not available7536%
Restricted access136%
Other42%
Total211100%
Table 10

Nature of the Existing Cybersecurity Datasets per Zheng et al. (2018) Classification.

CATEGORYEXAMPLESNUMBER OF DATASETSPERCENT
User and organization characteristicsPatient or financial records, social media, reviews5727%
Attacker-relatedMalware, vulnerability data, security certificates4923%
Internet characteristicsNetwork traces, IP packets, access logs4923%
Defender artifactsSecurity alerts, non-leaked password databases136%
OtherImages, citation data, web pages4320%
Total211100%
Table 11

Nature of the Existing Cybersecurity Datasets per Sauerwein et al. (2019) Classification.

CATEGORYEXAMPLESNUMBER OF DATASETSPERCENT
AssetWhitelists, network traffic, emails, images15071%
ThreatSecurity alerts, data breaches3215%
CountermeasureSpam samples, VirusTotal samples, security certificates136%
AttackDDoS attack data74%
VulnerabilityVulnerability data84%
RiskMarket transactions1
Total211100%
Table 12

Availability of the Collected Datasets.

CATEGORYNUMBER OF DATASETSPERCENT
Not available9187%
Public1212%
Available upon request21%
Total104100%
dsj-23-1600-g3.png
Figure 3

Data and Cybersecurity Research, from Present to Future.

Language: English
Submitted on: Jun 20, 2023
Accepted on: Dec 15, 2023
Published on: Jan 19, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Inna Kouper, Stacy Stone, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.