Table 1
List of score categories, showing the number of questions in each. ‘FAIR’: Findable, Accessible, Interoperable, Reusable. For a list of all questions, see ‘Supplemental File 1: FAIR Rubric Questions’).
| NUMBER OF QUESTIONS | |
|---|---|
| Total FAIR score | 62 |
| Findable score | 24 |
| Accessible score | 8 |
| Interoperable score | 18 |
| Reusable score | 12 |
| Essential score | 37 |
| Intermediate score | 15 |
| Advanced score | 10 |

Figure 1
Horizontal box plot with overlaid data points showing all 392 overall Findable, Accessible, Interoperable, and Reusable (FAIR) scores. Each score is normalized to a maximum of 100 and does not take into account questions that are not applicable.

Figure 2
Horizontal box plot with overlaid data points showing scores for all 392 assessments, broken down in the four FAIR principles: Findable, Accessible, Interoperable, and Reusable. Each score is normalized to a maximum of 100 and does not take into account questions that are not applicable.

Figure 3
Horizontal box plot with overlaid data points showing scores for all 392 assessments, broken down in the three levels of FAIR characteristics: Essential, Intermediate, and Advanced. Each score is normalized to a maximum of 100 and does not take into account questions that are not applicable.

Figure 4
Bar chart showing the count distribution of the 392 datasets by publication year.

Figure 5
Box plots and data points showing the total normalized Findable, Accessible, Interoperable, and Reusable (FAIR) scores for datasets by publication year.
Table 2
The 10 rubric questions with the highest number of ‘Yes’ answers.
| QUESTION ID | NUMBER YES | QUESTION |
|---|---|---|
| F_4.0.6 | 392 | Is the following descriptive information included in the data release’s metadata? Data publication date |
| F_4.0.1 | 392 | Is the following descriptive information included in the data release’s metadata? Title |
| F_4.0.2 | 392 | Is the following descriptive information included in the data release’s metadata? Description (e.g., Abstract, Summary, Purpose) |
| F_4.0.11 | 391 | Is the following descriptive information included in the data release’s metadata? Keywords |
| F_2.0 | 390 | Is a separate identifier assigned for the data release’s metadata record? |
| F_2.1 | 390 | Is the assigned identifier persistent? |
| F_2.2 | 390 | Is the assigned identifier unique (i.e., has a unique value)? |
| A_1.1 | 390 | Is this landing page publicly accessible? |
| F_4.0.10 | 389 | Is the following descriptive information included in the data release’s metadata? Temporal information associated with the data release (e.g., start date and end date for when data were collected) |
| F_4.0.9 | 388 | Is the following descriptive information included in the data release’s metadata? Geographic location(s) associated with the data release (e.g., coordinates) |

Figure 6
Stacked bar chart showing the distribution of ‘Yes’, ‘No’, and ‘Not Applicable’ (N/A) answers for the rubric questions with the highest number of ‘Yes’ answers.
Table 3
The 12 rubric questions with the highest number of ‘No’ answers. ORCID: Open Researcher and Contributor IDs.
| QUESTION ID | NUMBER NO | QUESTION |
|---|---|---|
| I_5.0 | 392 | Is the data release described using Resource Description Format (RDF)/linked data with community-recognized ontologies? |
| F_3.0 | 390 | Are the authors/originators’ ORCID identifiers viewable (to humans) on the data release’s landing page? |
| F_3.1 | 379 | Are the authors/originators’ ORCID identifiers provided in the data release’s metadata? |
| R_1.2 | 363 | Is the approved USGS disclaimer statement present on the data release’s landing page? |
| R_2.3.2 | 359 | Is the following information included with the data release’s metadata? Citation(s) to the citable (community recognized) guidelines or standards used to describe the data quality information (e.g., using ISO 19157) |
| A_3.0.2 | 321 | Is the following information included with the data release’s landing page? Data distributor contact information |
| I_2.3 | 312 | Are all data files in a format that is: Available in multiple file formats |
| R_1.3 | 296 | Are recommended reuses present on the data release’s landing page? AND/OR Are known reuse limits included on the data release’s landing page? |
| R_2.2.2 | 291 | Is the following information included with the data release’s metadata? Citation(s) to the citable (community recognized) guidelines or standards used to describe the process/methodology information |
| R_2.3.1 | 278 | Is the following information included with the data release’s metadata? Detailed data quality information (e.g., data quality procedure documentation; data quality monitoring criteria during data collection, whether/how the completeness of the data files and their data values was evaluated) |
| R_1.1 | 265 | Are recommended reuses included in the data release’s metadata? AND/OR Are known reuse limits included in the data release’s metadata? |
| I_4.1 | 209 | Is information about data value consistency documented in the metadata? |

Figure 7
Stacked bar chart showing the distribution of ‘Yes’, ‘No’, and ‘Not Applicable’ (N/A) answers for the rubric questions with the highest number of ‘No’ answers.
Table 4
The 11 rubric questions with the highest number of ‘Not Applicable’ answers.
| QUESTION ID | NUMBER ‘NOT APPLICABLE’ | |
|---|---|---|
| F_4.0.8 | 359 | Is the following descriptive information included in the data release’s metadata? If applicable, data revision dates |
| F_4.0.7 | 349 | Is the following descriptive information included in the data release’s metadata? If applicable, data version |
| I_6.2 | 316 | If there are related data releases (other than source input datasets), are the relationships between the data releases: Described using Resource Description Format (RDF)/linked data |
| I_6.1 | 313 | If there are related data releases (other than source input datasets), are the relationships between the data releases: Documented in the metadata |
| I_3.1.2 | 197 | Does the data release’s metadata contain the following information about the data release’s attributes? ALL names/labels are using citable and publicly available sources |
| R_2.1 | 188 | Is the following information included with the data release’s metadata? If input datasets are used, the citations to the input datasets |
| I_3.1.1 | 162 | Does the data release’s metadata contain the following information about the data release’s attributes? At least one name/label is using a citable and publicly available source |
| I_3.6 | 109 | Does the data release’s metadata contain the following information about the data release’s attributes? Allowable data values |
| R_3.0 | 80 | Related resources documented in the data release’s metadata (e.g., project website, publications, use cases, job aids, user’s guide, data processing code with readme, product algorithm document) |
| I_3.3 | 58 | Does the data release’s metadata contain the following information about the data release’s attributes? Units |
| I_3.5 | 57 | Does the data release’s metadata contain the following information about the data release’s attributes? Data value range |

Figure 8
Stacked bar chart showing the distribution of ‘Yes,’ ‘No,’ and ‘Not Applicable’ (N/A) answers for the rubric questions with the highest number of ‘Not Applicable’ answers.
Table 5
The 12 rubric questions with the highest increase in ‘Yes’ questions after USGS data policy was implemented in 2016.
| QUESTION ID | CHANGE POST-POLICY | SCORE PRE-POLICY | SCORE POST-POLICY | |
|---|---|---|---|---|
| F_1.1 | 55.3 | 44.4 | 99.6 | Is the assigned identifier persistent? |
| R_1.0 | 42.3 | 37.9 | 80.2 | Is an approved USGS disclaimer statement included in the data release’s metadata? |
| A_4.1 | 40.7 | 54.8 | 95.5 | Can users obtain the data release’s metadata files by manual actions (human) |
| F_1.2 | 36.3 | 63.7 | 100 | Is the assigned identifier unique (i.e., has a unique value)? |
| F_1.3 | 34.7 | 65.3 | 100 | Is the assigned identifier viewable on the data release’s landing page? |
| F_1.0 | 31.5 | 68.5 | 100 | Is an identifier assigned for the data release and documented in the data release’s metadata record? |
| I_2.1 | 25 | 64.5 | 89.6 | Are all data files in a format that is: Non-proprietary (open format, i.e., accessible via free software) |
| A_2.0 | 23.5 | 75 | 98.5 | Does the data release’s identifier resolve to the human readable landing page? |
| I_6.1 | 15.7 | 70 | 85.7 | If there are related data releases (other than source input datasets), are the relationships between the data releases: Documented in the metadata |
| I_2.4 | 15.1 | 83.1 | 98.1 | Are all data files in a format that is: Expected or commonly used by the relevant research community |
| R_2.2.2 | 14.1 | 14.8 | 28.9 | Is the following information included with the data release’s metadata? Citation(s) to the citable (community recognized) guidelines or standards used to describe the process/methodology information |
| I_3.3 | 10.2 | 60.4 | 70.6 | Does the data release’s metadata contain the following information about the data release’s attributes? Units |

Figure 9
Horizontal bar plot showing the 11 questions that address elements affected by the USGS data policy implementation in 2016, showing an increase in number of ‘Yes’ answers for all questions.
Table 6
Strategies.
| STRATEGY | CATEGORY | FAIR WORKSHOP PROPOSED ACTIVITY | FAIR ELEMENT IMPROVED | LEVEL OF EFFORT | ROI | |
|---|---|---|---|---|---|---|
| R1 | Convene repository managers to develop core shared standards for presentation of/access to data and metadata via landing pages | Data Repositories | 5–1 5–12 | F,A | M | M |
| R2 | Move repositories towards standard processes, workflows, and services for intake of new data releases | Data Repositories | 5–5 5–17 5–21 | F,A | M | H |
| P1 | Reevaluate minimum characteristics for repositories to be considered for inclusion in the acceptable repositories list | Policy | 5–1 | F,A | M | M |
| P2 | Clarify requirements for and implementation of disclaimers, licenses, and constraints on use and access | Policy | 2–1 2–2 2–14 | A,R | M | M |
| P3 | Institute peer review process for comprehensive data management plans at project outset | Policy | 7–2 | A,R | M | H |
| C1 | Convene working group to improve data quality documentation practices in metadata | Community & Training | – | R | H | H |
| C2 | Use community-based approach to define data dictionaries that support linked open data | Community & Training | 3–6 | I | H | H |
| C3 | Convene repository managers to develop consistent practices for documenting version history and links between versions | Community & Training | 7–3 | F,A | M | M |
| C4 | Consider developing training program for writing data management plans that anticipate and plan for FAIR requirements | Community & Training | 7–2 | F,A,R | M | H |
| C5 | Use community-based approach to evaluate open and machine-readable data formats and develop best practices for implementation by scientists and repositories | Community & Training | 5–17 5–21 | A,R | M | H |
| C6 | Consider developing training to support broader understanding of persistent identifiers for access, credit, citation, and use of data | Community & Training | – | A,R | L | M |
| C7 | Leverage community groups to support adoption of shared classification schemes and vocabularies to describe and characterize data assets | Community & Training | – | F,A,I | M | M |
| M1 | Consider adoption of ISO to facilitate inclusion of more precise, unambiguous, and FAIR descriptions of dataset characteristics | Metadata | – | F,A,I,R | H | H |
| M2 | Optimize metadata editor tools to document data in a standards-agnostic language, to facilitate interoperability with applications, standards, and workflows | Metadata | – | F,A,I,R | H | H |
| M3 | Promote best practices for reusable metadata elements that are citable and discoverable on their own | Metadata | 2–4 3–4 3–6 | R | M | M |
| M4 | Improve metadata tools, as informed by usability analyses | Metadata | F,A,I,R | M | M | |
| M5 | Evaluate opportunities to apply AI/ML tools to metadata assessments, possibly broadening range of applicability | Metadata | F,A,I,R | M | H |
[i] Table legend: L: low, M: medium, H: high, ROI: Return on investment. FAIR (Findable, Accessible, Interoperable, and Reusable) workshop proposed activities: the numbers reference proposed activities in Lightsom et al. (2022).
