Table 1
General Force11 data citation principles (left) and recommendations of the RDA WGDC on the citation of evolving data (right).
| Force 11 – Data Citation Principles | RDA-endorsed recommendations on the citation of evolving data (RDA WGDC) | |
|---|---|---|
| 1. Importance | R1 – Data Versioning | A. Preparing the Data and the Query Store |
| 2. Credit and Attribution | R2 – Timestamping | |
| 3. Evidence | R3 – Query Store | |
| 4. Unique Identification | R4 – Query Uniqueness | B. Persistently Identify Specific Data sets |
| 5. Access | R5 – Stable Sorting | |
| 6. Persistence | R6 – Result Set Verification | |
| 7. Specificity and Verifiability | R7 – Query Timestamping | |
| 8. Interoperability and Flexibility | R8 – Query PID | |
| R9 – Store Query | ||
| R10 – Citation Text | ||
| R11 – Landing Page | C. Upon Request of a PID | |
| R12 – Machine Actionability | ||
| R13 – Technology Migration | D. Upon Modifications to the Data Infrastructure | |
| R14 – Migration Verification | ||
| model citation: | <mip_era>/<activity_id>/<institution_id>/<source_id> |
| experiment citation: | <mip_era>/<activity_id>/<institution_id>/<source_id>/<experiment_id>. |

Figure 1
Relation between CMIP6 early data citations and the IPCC-DDC AR6 Reference Data Archive in terms of data citations. The IPCC-DDC AR6 data is a snapshot as well as a subset of the CMIP6 data.

Figure 2
CMIP6 Data Citation Concept with the three main use cases to meet: provide citation information, discover citation information, and resolve a data citation (red: CMIP6 citation services, blue: ESGF services, green: services of scholarly publishers).

Figure 3
Landing page example for CMIP6 early citation based on CMIP5 data as proof of concept.

Figure 4
ESGF portal with display of data citation information for a CMIP5 data example as proof of concept.
Table 2
Comparison of the two PID approach for CMIP6 citation against the RDA-endorsed recommendations for the citation of evolving data.
| RDA Recommendation | Extended CMIP6 Citation Concept (including Data Subset PIDs based on a data cart approach) |
|---|---|
| R1 – Data Versioning | Individual datasets in the data subset collection are versioned and static. The version is part of the dataset ID and part of the stored queries. |
| R2 – Timestamping | For the individual datasets in the data subset collection, the publication and unpublication is timestamped and the metadata is flagged. The time of publication is part of the version string. Versioned datasets are not changed. Changed datasets are published under new versions. Data cart content is not changed by data publication. |
| R3 – Query Store | The faceted ESGF search provides the query functionality. These queries are stored in the data carts. Individual queries include a filter by version i.e. ESGF publication period. Data cart content is reproduced by the execution of a combined query, connecting every stored query with a logical OR. |
| R4 – Query Uniqueness | The stored queries of the faceted ESGF search have a uniform order of search attributes. Because it is highly unlikely that two users will use identical data subsets in their data carts, query uniqueness is not checked. |
| R5 – Stable Sorting | Datasets as smallest subsets are static and consist of one or more individual files. The order of the downloaded files is not important. Record sorting is not relevant for this application. |
| R6 – Result Set Verification | The versioned datasets in the data cart are static and can be verified by their SHA256 checksums stored in the metadata. For the queries in the cart an additional checksum is available based on the result set in order to identify missing files in the downloaded data cart data. |
| R7 – Query Timestamping | Query results are static as their search results consist of static versioned individual datasets. Timestamping of the queries is not necessary but can provide useful provenance information on cart content history. |
| R8 – Query PID | Public data carts are assigned unique IDs, which are used for data cart content display. PIDs are registered upon user request on public data carts. Their contents are no longer changeable by the users. |
| R9 – Store Query | When the user adds data to a data cart, the query for the faceted ESGF search is stored in a normalized form together with the timestamp and a checksum. Additional metadata is stored for public data carts with registered PIDs, i.e. citation information and references to the data superset DOIs. |
| R10 – Citation Text | The citation recommendation is displayed on the landing page of the data subset PID including references to the data superset DOIs. |
| R11 – Landing Page | A landing page for the public data carts is provided. For public data carts with PIDs the citation recommendation is added. |
| R12 – Machine Actionability | A machine-readable version of the landing page based on the ESGF search API will be provided for public data carts including download information. Because of the possible high data volume, an automated download without checking the download volume beforehand, is not desirable. |
| R13 – Technology Migration | The query results are only dependent on the syntax of the faceted ESGF search. Technologically, a migration is a transfer of data cart metadata. Changes in the ESGF search facet names would require query rewrites and verification. |
| R14 – Migration Verification | not verified |
