Have a personal or library account? Click to login
CMIP6 Data Citation of Evolving Data Cover

CMIP6 Data Citation of Evolving Data

Open Access
|Jun 2017

Figures & Tables

Table 1

General Force11 data citation principles (left) and recommendations of the RDA WGDC on the citation of evolving data (right).

Force 11 – Data Citation PrinciplesRDA-endorsed recommendations on the citation of evolving data (RDA WGDC)
1. ImportanceR1 – Data VersioningA. Preparing the Data and the Query Store
2. Credit and AttributionR2 – Timestamping
3. EvidenceR3 – Query Store
4. Unique IdentificationR4 – Query UniquenessB. Persistently Identify Specific Data sets
5. AccessR5 – Stable Sorting
6. PersistenceR6 – Result Set Verification
7. Specificity and VerifiabilityR7 – Query Timestamping
8. Interoperability and FlexibilityR8 – Query PID
R9 – Store Query
R10 – Citation Text
R11 – Landing PageC. Upon Request of a PID
R12 – Machine Actionability
R13 – Technology MigrationD. Upon Modifications to the Data Infrastructure
R14 – Migration Verification
model citation:<mip_era>/<activity_id>/<institution_id>/<source_id>
experiment citation:<mip_era>/<activity_id>/<institution_id>/<source_id>/<experiment_id>.
dsj-16-636-g1.png
Figure 1

Relation between CMIP6 early data citations and the IPCC-DDC AR6 Reference Data Archive in terms of data citations. The IPCC-DDC AR6 data is a snapshot as well as a subset of the CMIP6 data.

dsj-16-636-g2.png
Figure 2

CMIP6 Data Citation Concept with the three main use cases to meet: provide citation information, discover citation information, and resolve a data citation (red: CMIP6 citation services, blue: ESGF services, green: services of scholarly publishers).

dsj-16-636-g3.png
Figure 3

Landing page example for CMIP6 early citation based on CMIP5 data as proof of concept.

dsj-16-636-g4.png
Figure 4

ESGF portal with display of data citation information for a CMIP5 data example as proof of concept.

Table 2

Comparison of the two PID approach for CMIP6 citation against the RDA-endorsed recommendations for the citation of evolving data.

RDA RecommendationExtended CMIP6 Citation Concept
(including Data Subset PIDs based on a data cart approach)
R1 – Data VersioningIndividual datasets in the data subset collection are versioned and static. The version is part of the dataset ID and part of the stored queries.
R2 – TimestampingFor the individual datasets in the data subset collection, the publication and unpublication is timestamped and the metadata is flagged. The time of publication is part of the version string. Versioned datasets are not changed. Changed datasets are published under new versions. Data cart content is not changed by data publication.
R3 – Query StoreThe faceted ESGF search provides the query functionality. These queries are stored in the data carts. Individual queries include a filter by version i.e. ESGF publication period. Data cart content is reproduced by the execution of a combined query, connecting every stored query with a logical OR.
R4 – Query UniquenessThe stored queries of the faceted ESGF search have a uniform order of search attributes. Because it is highly unlikely that two users will use identical data subsets in their data carts, query uniqueness is not checked.
R5 – Stable SortingDatasets as smallest subsets are static and consist of one or more individual files. The order of the downloaded files is not important. Record sorting is not relevant for this application.
R6 – Result Set VerificationThe versioned datasets in the data cart are static and can be verified by their SHA256 checksums stored in the metadata. For the queries in the cart an additional checksum is available based on the result set in order to identify missing files in the downloaded data cart data.
R7 – Query TimestampingQuery results are static as their search results consist of static versioned individual datasets. Timestamping of the queries is not necessary but can provide useful provenance information on cart content history.
R8 – Query PIDPublic data carts are assigned unique IDs, which are used for data cart content display. PIDs are registered upon user request on public data carts. Their contents are no longer changeable by the users.
R9 – Store QueryWhen the user adds data to a data cart, the query for the faceted ESGF search is stored in a normalized form together with the timestamp and a checksum. Additional metadata is stored for public data carts with registered PIDs, i.e. citation information and references to the data superset DOIs.
R10 – Citation TextThe citation recommendation is displayed on the landing page of the data subset PID including references to the data superset DOIs.
R11 – Landing PageA landing page for the public data carts is provided. For public data carts with PIDs the citation recommendation is added.
R12 – Machine ActionabilityA machine-readable version of the landing page based on the ESGF search API will be provided for public data carts including download information. Because of the possible high data volume, an automated download without checking the download volume beforehand, is not desirable.
R13 – Technology MigrationThe query results are only dependent on the syntax of the faceted ESGF search. Technologically, a migration is a transfer of data cart metadata. Changes in the ESGF search facet names would require query rewrites and verification.
R14 – Migration Verificationnot verified
Language: English
Submitted on: Oct 17, 2016
Accepted on: May 8, 2017
Published on: Jun 15, 2017
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2017 Martina Stockhause, Michael Lautenschlager, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.