Have a personal or library account? Click to login
State of the Data: Assessing the FAIRness of US Geological Survey Data Cover

State of the Data: Assessing the FAIRness of US Geological Survey Data

Open Access
|Apr 2024

Figures & Tables

Table 1

List of score categories, showing the number of questions in each. ‘FAIR’: Findable, Accessible, Interoperable, Reusable. For a list of all questions, see ‘Supplemental File 1: FAIR Rubric Questions’).

NUMBER OF QUESTIONS
Total FAIR score62
Findable score24
Accessible score8
Interoperable score18
Reusable score12
Essential score37
Intermediate score15
Advanced score10
dsj-23-1624-g1.png
Figure 1

Horizontal box plot with overlaid data points showing all 392 overall Findable, Accessible, Interoperable, and Reusable (FAIR) scores. Each score is normalized to a maximum of 100 and does not take into account questions that are not applicable.

dsj-23-1624-g2.png
Figure 2

Horizontal box plot with overlaid data points showing scores for all 392 assessments, broken down in the four FAIR principles: Findable, Accessible, Interoperable, and Reusable. Each score is normalized to a maximum of 100 and does not take into account questions that are not applicable.

dsj-23-1624-g3.png
Figure 3

Horizontal box plot with overlaid data points showing scores for all 392 assessments, broken down in the three levels of FAIR characteristics: Essential, Intermediate, and Advanced. Each score is normalized to a maximum of 100 and does not take into account questions that are not applicable.

dsj-23-1624-g4.png
Figure 4

Bar chart showing the count distribution of the 392 datasets by publication year.

dsj-23-1624-g5.png
Figure 5

Box plots and data points showing the total normalized Findable, Accessible, Interoperable, and Reusable (FAIR) scores for datasets by publication year.

Table 2

The 10 rubric questions with the highest number of ‘Yes’ answers.

QUESTION IDNUMBER YESQUESTION
F_4.0.6392Is the following descriptive information included in the data release’s metadata? Data publication date
F_4.0.1392Is the following descriptive information included in the data release’s metadata? Title
F_4.0.2392Is the following descriptive information included in the data release’s metadata? Description (e.g., Abstract, Summary, Purpose)
F_4.0.11391Is the following descriptive information included in the data release’s metadata? Keywords
F_2.0390Is a separate identifier assigned for the data release’s metadata record?
F_2.1390Is the assigned identifier persistent?
F_2.2390Is the assigned identifier unique (i.e., has a unique value)?
A_1.1390Is this landing page publicly accessible?
F_4.0.10389Is the following descriptive information included in the data release’s metadata? Temporal information associated with the data release (e.g., start date and end date for when data were collected)
F_4.0.9388Is the following descriptive information included in the data release’s metadata? Geographic location(s) associated with the data release (e.g., coordinates)
dsj-23-1624-g6.png
Figure 6

Stacked bar chart showing the distribution of ‘Yes’, ‘No’, and ‘Not Applicable’ (N/A) answers for the rubric questions with the highest number of ‘Yes’ answers.

Table 3

The 12 rubric questions with the highest number of ‘No’ answers. ORCID: Open Researcher and Contributor IDs.

QUESTION IDNUMBER NOQUESTION
I_5.0392Is the data release described using Resource Description Format (RDF)/linked data with community-recognized ontologies?
F_3.0390Are the authors/originators’ ORCID identifiers viewable (to humans) on the data release’s landing page?
F_3.1379Are the authors/originators’ ORCID identifiers provided in the data release’s metadata?
R_1.2363Is the approved USGS disclaimer statement present on the data release’s landing page?
R_2.3.2359Is the following information included with the data release’s metadata?
Citation(s) to the citable (community recognized) guidelines or standards used to describe the data quality information (e.g., using ISO 19157)
A_3.0.2321Is the following information included with the data release’s landing page?
Data distributor contact information
I_2.3312Are all data files in a format that is:
Available in multiple file formats
R_1.3296Are recommended reuses present on the data release’s landing page? AND/OR Are known reuse limits included on the data release’s landing page?
R_2.2.2291Is the following information included with the data release’s metadata?
Citation(s) to the citable (community recognized) guidelines or standards used to describe the process/methodology information
R_2.3.1278Is the following information included with the data release’s metadata?
Detailed data quality information (e.g., data quality procedure documentation; data quality monitoring criteria during data collection, whether/how the completeness of the data files and their data values was evaluated)
R_1.1265Are recommended reuses included in the data release’s metadata? AND/OR Are known reuse limits included in the data release’s metadata?
I_4.1209Is information about data value consistency documented in the metadata?
dsj-23-1624-g7.png
Figure 7

Stacked bar chart showing the distribution of ‘Yes’, ‘No’, and ‘Not Applicable’ (N/A) answers for the rubric questions with the highest number of ‘No’ answers.

Table 4

The 11 rubric questions with the highest number of ‘Not Applicable’ answers.

QUESTION IDNUMBER ‘NOT APPLICABLE’
F_4.0.8359Is the following descriptive information included in the data release’s metadata? If applicable, data revision dates
F_4.0.7349Is the following descriptive information included in the data release’s metadata? If applicable, data version
I_6.2316If there are related data releases (other than source input datasets), are the relationships between the data releases: Described using Resource Description Format (RDF)/linked data
I_6.1313If there are related data releases (other than source input datasets), are the relationships between the data releases: Documented in the metadata
I_3.1.2197Does the data release’s metadata contain the following information about the data release’s attributes? ALL names/labels are using citable and publicly available sources
R_2.1188Is the following information included with the data release’s metadata? If input datasets are used, the citations to the input datasets
I_3.1.1162Does the data release’s metadata contain the following information about the data release’s attributes? At least one name/label is using a citable and publicly available source
I_3.6109Does the data release’s metadata contain the following information about the data release’s attributes? Allowable data values
R_3.080Related resources documented in the data release’s metadata (e.g., project website, publications, use cases, job aids, user’s guide, data processing code with readme, product algorithm document)
I_3.358Does the data release’s metadata contain the following information about the data release’s attributes? Units
I_3.557Does the data release’s metadata contain the following information about the data release’s attributes? Data value range
dsj-23-1624-g8.png
Figure 8

Stacked bar chart showing the distribution of ‘Yes,’ ‘No,’ and ‘Not Applicable’ (N/A) answers for the rubric questions with the highest number of ‘Not Applicable’ answers.

Table 5

The 12 rubric questions with the highest increase in ‘Yes’ questions after USGS data policy was implemented in 2016.

QUESTION IDCHANGE POST-POLICYSCORE PRE-POLICYSCORE POST-POLICY
F_1.155.344.499.6Is the assigned identifier persistent?
R_1.042.337.980.2Is an approved USGS disclaimer statement included in the data release’s metadata?
A_4.140.754.895.5Can users obtain the data release’s metadata files by manual actions (human)
F_1.236.363.7100Is the assigned identifier unique (i.e., has a unique value)?
F_1.334.765.3100Is the assigned identifier viewable on the data release’s landing page?
F_1.031.568.5100Is an identifier assigned for the data release and documented in the data release’s metadata record?
I_2.12564.589.6Are all data files in a format that is: Non-proprietary (open format, i.e., accessible via free software)
A_2.023.57598.5Does the data release’s identifier resolve to the human readable landing page?
I_6.115.77085.7If there are related data releases (other than source input datasets), are the relationships between the data releases: Documented in the metadata
I_2.415.183.198.1Are all data files in a format that is: Expected or commonly used by the relevant research community
R_2.2.214.114.828.9Is the following information included with the data release’s metadata? Citation(s) to the citable (community recognized) guidelines or standards used to describe the process/methodology information
I_3.310.260.470.6Does the data release’s metadata contain the following information about the data release’s attributes? Units
dsj-23-1624-g9.png
Figure 9

Horizontal bar plot showing the 11 questions that address elements affected by the USGS data policy implementation in 2016, showing an increase in number of ‘Yes’ answers for all questions.

Table 6

Strategies.

STRATEGYCATEGORYFAIR WORKSHOP PROPOSED ACTIVITYFAIR ELEMENT IMPROVEDLEVEL OF EFFORTROI
R1Convene repository managers to develop core shared standards for presentation of/access to data and metadata via landing pagesData Repositories5–1
5–12
F,AMM
R2Move repositories towards standard processes, workflows, and services for intake of new data releasesData Repositories5–5
5–17
5–21
F,AMH
P1Reevaluate minimum characteristics for repositories to be considered for inclusion in the acceptable repositories listPolicy5–1F,AMM
P2Clarify requirements for and implementation of disclaimers, licenses, and constraints on use and accessPolicy2–1
2–2
2–14
A,RMM
P3Institute peer review process for comprehensive data management plans at project outsetPolicy7–2A,RMH
C1Convene working group to improve data quality documentation practices in metadataCommunity & TrainingRHH
C2Use community-based approach to define data dictionaries that support linked open dataCommunity & Training3–6IHH
C3Convene repository managers to develop consistent practices for documenting version history and links between versionsCommunity & Training7–3F,AMM
C4Consider developing training program for writing data management plans that anticipate and plan for FAIR requirementsCommunity & Training7–2F,A,RMH
C5Use community-based approach to evaluate open and machine-readable data formats and develop best practices for implementation by scientists and repositoriesCommunity & Training5–17
5–21
A,RMH
C6Consider developing training to support broader understanding of persistent identifiers for access, credit, citation, and use of dataCommunity & TrainingA,RLM
C7Leverage community groups to support adoption of shared classification schemes and vocabularies to describe and characterize data assetsCommunity & TrainingF,A,IMM
M1Consider adoption of ISO to facilitate inclusion of more precise, unambiguous, and FAIR descriptions of dataset characteristicsMetadataF,A,I,RHH
M2Optimize metadata editor tools to document data in a standards-agnostic language, to facilitate interoperability with applications, standards, and workflowsMetadataF,A,I,RHH
M3Promote best practices for reusable metadata elements that are citable and discoverable on their ownMetadata2–4
3–4
3–6
RMM
M4Improve metadata tools, as informed by usability analysesMetadataF,A,I,RMM
M5Evaluate opportunities to apply AI/ML tools to metadata assessments, possibly broadening range of applicabilityMetadataF,A,I,RMH

[i] Table legend: L: low, M: medium, H: high, ROI: Return on investment. FAIR (Findable, Accessible, Interoperable, and Reusable) workshop proposed activities: the numbers reference proposed activities in Lightsom et al. (2022).

Language: English
Submitted on: Aug 18, 2023
|
Accepted on: Mar 23, 2024
|
Published on: Apr 26, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Vivian B. Hutchison, Tamar Norkin, Lisa S. Zolly, Leslie Hsu, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.