Table 1
Risk factors for scientific data collections.
| Risk Factor | Description | |
|---|---|---|
| 1. | Lack of use | Data are rarely accessed and dubbed ‘unwanted’, thus getting thrown away |
| 2. | Loss of funding for archive | The whole archive loses its funding source |
| 3. | Loss of funding for specific datasets | Lack of funding to monitor, maintain, and otherwise work with specific data |
| 4. | Loss of knowledge around context or access | The loss of individuals who know how to access the data or know the metadata associated with these data that make the data useable to others, e.g. due to retirement or death |
| 5. | Lack of documentation & metadata | Data cannot be interpreted due to lack of contextual knowledge |
| 6. | Data mislabeling | Data are lost because they are poorly identified (either physically or digitally) |
| 7. | Catastrophes | Fires, floods, wars/human conflicts, etc |
| 8. | Poor data governance | Uncertain or unknown decision making processes impede effective data management |
| 9. | Legal status for ownership and use | Uncertain, unknown, or restrictive legal status limits the possible uses of data |
| 10. | Media deterioration | Physical media deterioration prevents data from being accessed (paper, tape, or digital media) |
| 11. | Missing files | Data files are lost without any known reason |
| 12. | Dependence on service provider | Risks due to potential single point of failure problems if a particular service provider goes out of business |
| 13. | Accidental deletion | Data are accidentally deleted by a staff error |
| 14. | Lack of planning | Lack of planning puts data collections at risk of being susceptible to unexpected events |
| 15. | Cybersecurity breach | Data are intentionally deleted or corrupted via a security breach, e.g. malware |
| 16. | Over-abundance | Difficulty dealing with too much data results in reduction in value or quality of whole collections |
| 17. | Political interference | Data deleted or made inaccessible due to political decisions |
| 18. | Lack of provenance information | Data cannot be trusted or understood because of a lack of information about data processing steps, or about data stewardship chains of trust |
| 19. | File format obsolescence | Data cannot be accessed due to lack of knowledge, equipment, or software for reading a specific file format |
| 20. | Storage hardware breakdown | Sudden & catastrophic malfunction of storage hardware |
| 21. | Bit rot and data corruption | Gradual corruption of digital data due to an accumulation of non-critical failures (bits flipping) in a data storage device |
Table 2
Methods for Categorizing Data Risks.
| Categorization Method | Description |
|---|---|
| Severity of risk | How much impact could this risk factor have on the data itself, regardless of the current importance of data to the user? |
| Likelihood of occurrence | How likely a risk factor is to occur |
| Length of recovery time | How long it would take to recover data or re-establish data accessibility |
| Impact on user | How significantly data users are impacted by data loss or loss of data accessibility |
| Who is responsible for addressing the problem | Who has the expertise and responsibility to mitigate or respond to particular risk factors |
| Cause of problem | What caused a data risk factor to occur |
| Degree of control | How much control an organization or individual has over whether a risk factor is present or will occur |
| Proactive vs reactive response | Whether risk factors can be mitigated via preventative measures, or whether they must be responded to upon occurrence |
| Nature of mitigation | What steps must be taken or processes put in place to prevent a risk, or mitigate a risk after it has occurred |
| Resources required for mitigation | What time, money, or personnel resources will be necessary to mitigate risk factors |
Table 3
Example of a blank data risk assessment matrix, after selection of specific risk factors and categorization methods of interest.
| Risk Factors | Categorization Methods | |||
|---|---|---|---|---|
| Severity of risk | Likelihood of occurrence | Cause of problem | Resources req’d for mitigation | |
| Lack of use | ||||
| Loss of knowledge | ||||
| Lack of docs & metadata | ||||
| Catastrophes | ||||
| Poor data governance | ||||
| Media deterioration | ||||
| RISK FACTORS | Categorization Methods | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Severity of risk | Likelihood of occurrence | Length of recovery | Impact on user | Who is responsible | Cause of problem | Degree of control | Proactive vs reactive response | Nature of mitigation | Resources req’d for mitigation | |
| Lack of use | ||||||||||
| Loss of funding for archive | ||||||||||
| Loss of funding for specific datasets | ||||||||||
| Loss of knowledge | ||||||||||
| Lack of docs & metadata | ||||||||||
| Data mislabeling | ||||||||||
| Catastrophes | ||||||||||
| Poor data governance | ||||||||||
| Legal status for ownership and use | ||||||||||
| Media deterioration | ||||||||||
| Missing files | ||||||||||
| Dependence on service provider | ||||||||||
| Accidental deletion | ||||||||||
| Lack of planning | ||||||||||
| Cybersecurity breach | ||||||||||
| Over-abundance | ||||||||||
| Political interference | ||||||||||
| Lack of provenance information | ||||||||||
| File format obsolescence | ||||||||||
| Storage hardware breakdown | ||||||||||
| Bit rot and data corruption | ||||||||||
