Have a personal or library account? Click to login

Figures & Tables

Table 1

Risk factors for scientific data collections.

Risk FactorDescription
1.Lack of useData are rarely accessed and dubbed ‘unwanted’, thus getting thrown away
2.Loss of funding for archiveThe whole archive loses its funding source
3.Loss of funding for specific datasetsLack of funding to monitor, maintain, and otherwise work with specific data
4.Loss of knowledge around context or accessThe loss of individuals who know how to access the data or know the metadata associated with these data that make the data useable to others, e.g. due to retirement or death
5.Lack of documentation & metadataData cannot be interpreted due to lack of contextual knowledge
6.Data mislabelingData are lost because they are poorly identified (either physically or digitally)
7.CatastrophesFires, floods, wars/human conflicts, etc
8.Poor data governanceUncertain or unknown decision making processes impede effective data management
9.Legal status for ownership and useUncertain, unknown, or restrictive legal status limits the possible uses of data
10.Media deteriorationPhysical media deterioration prevents data from being accessed (paper, tape, or digital media)
11.Missing filesData files are lost without any known reason
12.Dependence on service providerRisks due to potential single point of failure problems if a particular service provider goes out of business
13.Accidental deletionData are accidentally deleted by a staff error
14.Lack of planningLack of planning puts data collections at risk of being susceptible to unexpected events
15.Cybersecurity breachData are intentionally deleted or corrupted via a security breach, e.g. malware
16.Over-abundanceDifficulty dealing with too much data results in reduction in value or quality of whole collections
17.Political interferenceData deleted or made inaccessible due to political decisions
18.Lack of provenance informationData cannot be trusted or understood because of a lack of information about data processing steps, or about data stewardship chains of trust
19.File format obsolescenceData cannot be accessed due to lack of knowledge, equipment, or software for reading a specific file format
20.Storage hardware breakdownSudden & catastrophic malfunction of storage hardware
21.Bit rot and data corruptionGradual corruption of digital data due to an accumulation of non-critical failures (bits flipping) in a data storage device
Table 2

Methods for Categorizing Data Risks.

Categorization MethodDescription
Severity of riskHow much impact could this risk factor have on the data itself, regardless of the current importance of data to the user?
Likelihood of occurrenceHow likely a risk factor is to occur
Length of recovery timeHow long it would take to recover data or re-establish data accessibility
Impact on userHow significantly data users are impacted by data loss or loss of data accessibility
Who is responsible for addressing the problemWho has the expertise and responsibility to mitigate or respond to particular risk factors
Cause of problemWhat caused a data risk factor to occur
Degree of controlHow much control an organization or individual has over whether a risk factor is present or will occur
Proactive vs reactive responseWhether risk factors can be mitigated via preventative measures, or whether they must be responded to upon occurrence
Nature of mitigationWhat steps must be taken or processes put in place to prevent a risk, or mitigate a risk after it has occurred
Resources required for mitigationWhat time, money, or personnel resources will be necessary to mitigate risk factors
Table 3

Example of a blank data risk assessment matrix, after selection of specific risk factors and categorization methods of interest.

Risk FactorsCategorization Methods
Severity of riskLikelihood of occurrenceCause of problemResources req’d for mitigation
Lack of use
Loss of knowledge
Lack of docs & metadata
Catastrophes
Poor data governance
Media deterioration
RISK FACTORSCategorization Methods
Severity of riskLikelihood of occurrenceLength of recoveryImpact on userWho is responsibleCause of problemDegree of controlProactive vs reactive responseNature of mitigationResources req’d for mitigation
Lack of use
Loss of funding for archive
Loss of funding for specific datasets
Loss of knowledge
Lack of docs & metadata
Data mislabeling
Catastrophes
Poor data governance
Legal status for ownership and use
Media deterioration
Missing files
Dependence on service provider
Accidental deletion
Lack of planning
Cybersecurity breach
Over-abundance
Political interference
Lack of provenance information
File format obsolescence
Storage hardware breakdown
Bit rot and data corruption
Language: English
Submitted on: Dec 19, 2019
Accepted on: Feb 2, 2020
Published on: Mar 12, 2020
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2020 Matthew S. Mayernik, Kelsey Breseman, Robert R. Downs, Ruth Duerr, Alexis Garretson, Chung-Yi (Sophie) Hou, Environmental Data Governance Initiative (EDGI) and Earth Science Information Partners (ESIP) Data Stewardship Committee, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.