Table 1
Summary of the FAIRness evaluation tools which we assessed but decided not to use in the context of this study. The evaluation approaches were assessed in April 2021; a reassessment took place for some tools in February 2022 (see references).
| TOOL | NOT USED BECAUSE | REFERENCE |
|---|---|---|
| ANDS-Nectar-RDS FAIR data self-assessment tool | not accessible | ANDS (2021) |
| DANS-Fairdat | pilot version meant for internal testing at DANS | Thomas (2017) |
| SATIFYD | not maintained anymore (L. Cepinskas (DANS), pers. comm. 24 March 21) | Fankhauser et al. (2019) |
| The CSIRO 5-star Data Rating tool | not accessible as online tool | Yu & Cox (2017) |
| The Scientific Data Stewardship Maturity Assessment Model | non-automated capture of evaluation results; proprietary document format | Peng et al. (2015) |
| Data Stewardship Wizard | assistance for FAIR data management planning, not for evaluation of archived data | Pergl et al. (2019) |
| RDA-SHARC Evaluation | no fillable form readily provided | David et al. (2018) |
| WMO Stewardship Maturity Matrix for Climate Data (SMM-CD) | non-automated capture of evaluation results; proprietary document format | Peng et al. (2020) |
| Data Use and Services Maturity Matrix | unclear application concept | The MM-Serv Working Group (2018) |
| ARDC FAIR Self-Assessment Tool | test results not saveable; no quantitative FAIR measure | Schweitzer et al. (2021) |
Table 2
Summary of the five FAIRness evaluation tools used in this study. The hybrid method of FAIRshake combines automated and manual evaluation. The covered FAIR ((F)indable, (A)ccessible, (I)nteroperable, (R)eusable) dimensions refer to the number of metrics the tool tests, such as FMES checks for Findability using 8 different tests.
| TOOL | ACRONYM | METHOD | COVERED FAIR DIMENSIONS | REFERENCE |
|---|---|---|---|---|
| Checklist for Evaluation of Dataset Fitness for Use | CFU | manual | n/a | Austin et al. (2019) |
| FAIR Maturity Evaluation Service | FMES | automated | F: 8, A: 5, I: 7, R: 2 | Wilkinson et al. (2019) |
| FAIRshake | n/a | hybrid | F: 3, A: 1, I: 0, R: 5 | Clarke et al. (2019) |
| F-UJI | n/a | automated | F: 7, A: 3, I: 4, R: 10 | Devaraju et al. (2021) |
| Self Assessment | n/a | manual | F: 13, A: 12, I: 10, R: 10 | Bahim et al. (2020) |
Table 3
WDCC projects selected for evaluation. The project acronyms can be directly used to search and find the evaluated projects using the WDCC GUI. The project volume in TB (third column) refers to the total volume of the entire project named in the first column. See Peters-von Gehlen, & Höck (2021) for details of evaluated resources.
| PROJECT ACRONYM | DATA SUMMARY | PROJECT VOLUME [TB] | DOI ASSIGNED | CREATION DATE | COMMENTS |
|---|---|---|---|---|---|
| IPCC-AR5_CMIP5 | Coupled Climate Model Output, prepared following CMIP5 guidelines and basis of the IPCC 5th Assessment Report (2 AICs evaluated) | 1655 | yes and no | 2012-05-31 and 2011-10-10 | |
| CliSAP | Observational data products from satellite remote sensing (2 AICs evaluated) | 163 | yes and no | 2015-09-15 and 2009-11-12 | one collection with no data access |
| WASCAL | Dynamically downscaled climate data for West Africa | 73 | yes | 2017-02-23 | |
| CMIP6_RCM_forcing_MPI-ESM1-2 | Coupled Climate Model output prepared as boundary conditions for regional climate models, prepared following CMIP6 experiment guidelines | 51 | yes | 2020-02-27 | |
| MILLENNIUM_COSMOS | Coupled Climate Model of ensemble simulations covering the last millennium (800-2000AD) | 47 | no | 2009-05-12 | |
| IPCC_TAR_ECHAM4/OPYC | Coupled Climate Model Output, prepared to support the IPCCs 3rd Assessment Report | 2.6 | yes | 2003-01-26 | Experiment and dataset with DOI; First ever DOI assigned to data (Stendel et al. 2004) |
| Storm_Tide_1906_German_Bight | Numerical simulation of the 1906 storm tide in the German Bight | 0.3 | yes | 2020-10-27 | |
| COPS | Observational data obtained from radar remote sensing during the COPS (Convective and Orographically-Induced Precipitation Study) campaign | 0.2 | yes | 2008-01-28 | |
| HDCP2-OBS | Observations collected during the HDCP2 (High Definition Clouds and Precipitation for Climate Prediction) project | 0.06 | yes | 2018-09-18 | |
| OceanRAIN | In-situ, along-track shipboard observations of routinely measured atmospheric and oceanic state parameters over global oceans | 0.01 | yes | 2017-12-13 7 | |
| CARIBIC | Observations of atmospheric parameters obtained from commercial aircraft equipped with an instrumentation container | 7.7E-5 | no | 2002-04-27 |
Table 4
Results of FAIR assessments of WDCC data holdings using the ensemble of FAIRness evaluation tools detailed in Section 2.1. The scores per test are calculated as unweighted mean over all tested FAIR maturity indicators. The mean (∅), standard deviation (σ) and relative standard deviation on a project basis (three rightmost columns) are calculated across the scores of the five FAIR assessment tools. The mean value representative for the WDCC (∅ (WDCC), last row) is calculated for all values in the respective column of the table. See main text for more details. Results at finer granularity are provided in the supporting data (Peters-von Gehlen et al., 2021).
| PROJECT ACRONYM | SELF-ASSESSMENT | CFU | FMES | F-UJI | FAIRSHAKE | ∅ PER PROJECT | σ PER PROJECT | PER PROJECT |
|---|---|---|---|---|---|---|---|---|
| IPCC-AR5_CMIP5 | 0.84 | 0.72 | 0.44 | 0.58 | 0.95 | 0.71 | 0.20 | 0.29 |
| IPCC-AR5_CMIP5, no DOI | 0.65 | 0.67 | 0.44 | 0.54 | 0.93 | 0.65 | 0.19 | 0.29 |
| CliSAP | 0.86 | 0.78 | 0.48 | 0.58 | 0.97 | 0.73 | 0.20 | 0.28 |
| CliSAP, no data accessible | 0.27 | 0.30 | 0.43 | 0.52 | 0.64 | 0.43 | 0.15 | 0.36 |
| WASCAL | 0.90 | 0.80 | 0.50 | 0.58 | 0.91 | 0.74 | 0.18 | 0.25 |
| CMIP6_RCM_forcing_MPI-ESM1-2 | 0.86 | 0.85 | 0.57 | 0.62 | 0.92 | 0.76 | 0.16 | 0.21 |
| MILLENNIUM_COSMOS | 0.63 | 0.53 | 0.45 | 0.51 | 0.82 | 0.59 | 0.14 | 0.24 |
| IPCC_TAR_ECHAM4/OPYC | 0.82 | 0.63 | 0.50 | 0.64 | 0.89 | 0.70 | 0.16 | 0.23 |
| Storm_Tide_1906_German_Bight | 0.90 | 0.68 | 0.55 | 0.62 | 0.83 | 0.71 | 0.15 | 0.21 |
| COPS | 0.86 | 0.47 | 0.53 | 0.55 | 0.87 | 0.66 | 0.19 | 0.29 |
| HDCP2-OBS | 0.90 | 0.48 | 0.53 | 0.59 | 0.86 | 0.67 | 0.19 | 0.29 |
| OceanRAIN | 0.90 | 0.75 | 0.57 | 0.60 | 0.97 | 0.76 | 0.18 | 0.23 |
| CARIBIC | 0.62 | 0.70 | 0.50 | 0.54 | 0.82 | 0.64 | 0.13 | 0.20 |
| ∅(WDCC) | 0.77 | 0.64 | 0.50 | 0.58 | 0.88 | 0.67 | 0.15 | 0.22 |
Table 5
Cross-correlations between the scores per project obtained with the five FAIRness evaluation tools (Table 4).
| SELF-ASSESSMENT | CFU | FMES | F-UJI | FAIRSHAKE | |
|---|---|---|---|---|---|
| Self-Assessment | n/a | 0.61 | 0.65 | 0.73 | 0.79 |
| CFU | n/a | 0.36 | 0.50 | 0.78 | |
| FMES | n/a | 0.65 | 0.30 | ||
| F-UJI | n/a | 0.49 | |||
| FAIRshake | n/a |
Table 6
Summary of the experiences gained from applying the ensemble of different FAIRness evaluation approaches in this study.
| AUTOMATED | MANUAL | HYBRID | |
|---|---|---|---|
| applied tools | FMES (Wilkinson et al., 2019) F-UJI (Devaraju & Huber, 2020) | CFU self-assessment (Bahim et al., 2020) | FAIRshake (Clarke et al., 2019) |
| application/use of the tool | the tools take PID/DOI of the resource to be evaluated if available, selection of appropriate metric sets is critical and requires prior review | completing questionnaires is time intensive and depends on the extent of metrics expert knowledge is essential | the tools take PID/DOI of the resource to be evaluated selection of appropriate metric sets is critical and requires prior review expert knowledge required to evaluate contextual reusability time intensive |
| preservation of results | results are saved in an online database or are exported (printed) as PDF local installations store results locally date of the evaluation has to be manually noted (in the tools evaluated here) | results are saved locally as spreadsheets date of the evaluation has to be manually noted | results are saved in an online database date of the evaluation has to be manually noted (using the tool evaluated here) |
| interpretation of results | detailed information on the applied metrics is available as documentation if tests fail, the tools provide technical output interpretable by experts results are provided as quantitative measure | the form is filled by a knowledgeable expert, interpretation is thus performed during the evaluation itself quantification of results depends on evaluator perception | detailed information on the applied automated metrics is available as documentation manual parts filled by a knowledgeable expert, interpretation is thus performed during the evaluation itself quantification of results partly depends on evaluator perception |
| reproducibility | results are reproducible as long as the same code version is used | human evaluation is subjective, reproducibility depends on manual documentation of each evaluation | reproducibility of atomated parts is given as long as the same code version is used human evaluation is subjective, reproducibility depends on manual documentation of each evaluation |
| evaluation of technical reusability/machine actionability | good tests fail if code specifications are not exactly met | limited machine actionability cannot be specifically tested assessment only based on implemented methods/protocols, not their functionality | very good failed automated tests can be manually amended given that an implementation is present but does not exactly match the test implementation |
| evaluation of con-textual reusability | limited domain-specific and agreed standardised FAIR metrics are needed | good to excellent depends on the domain-expertise of the evaluator and the time and effort put into the evaluation | good to excellent depends on the domain-expertise of the evaluator and the time and effort put into the evaluation |
| ACRONYM | DEFINITION |
|---|---|
| AIC | Archival Information Collection |
| AIP | Archival Information Package |
| AIU | Archival Information Unit |
| ANDS | Australian National Data Service |
| AR5 | 5th Assessment Report |
| ARDC | Australian Research Data Commons |
| CFU | Checklist for Evaluation of Dataset Fitness for Use |
| CliSAP | Integrated Climate System Analysis and Prediction |
| CMIP5/6 | Coupled Model Intercomparison Project 5/6 |
| COPS | Convective and Orographically Induced Precipitation Study |
| CORDEX | Coordinated Regional Downscaling Experiment |
| CSIRO | Commonwealth Scientific and Industrial Research Organisation |
| DANS | Data Archiving and Networked Services |
| DKRZ | German Climate Computing Center |
| DOI | Digital Object Identifier |
| DSJ | Data Science Journal |
| FMES | FAIR Maturity Evaluation Service |
| GUI | Graphical User Interface |
| HDCP2 | High Definition Clouds and Precipitation for Climate Prediction |
| IPCC | Intergovernmental Panel on Climate Change |
| JSON-LD | JavaScript Object Notation for Linked Data |
| NetCDF | Network Common Data Form |
| OAIS | Open Archival Information System |
| ORCiD | Open Researcher and Contributor Identifier |
| PB | Petabyte |
| PID | Persistent Identifier |
| RCM | Regional Climate Model |
| RDA | Research Data Alliance |
| URL | Uniform Resource Locator |
| WASCAL | West African Science Service Centre on Climate Change and Adapted Land Use |
| WDCC | World Data Center for Climate |
| WDS | World Data System |
| WG | Working Group |
| WMO | World Meteorological Organization |
