Have a personal or library account? Click to login
Fitness for Use of Data Objects Described with Quality Maturity Matrix at Different Phases of Data Production Cover

Fitness for Use of Data Objects Described with Quality Maturity Matrix at Different Phases of Data Production

Open Access
|Nov 2020

Figures & Tables

Table 1

Assignment of the DKRZ data dissemination system to the domains as described by Treloar & Harboe-Ree (2008).

DomainPhaseDKRZ system
Research preparation phaseConcept generationdata management (DM) planning tool RDMO
Private Researchproduction/processingDKRZ storage on hard disc and tape HPSS4
Shared Researchproject collaboration intended useESGF, globally distributed project repository
Publiclong-term archiving impact re-useLong-term Archive
dsj-19-1161-g1.png
Figure 1

Characteristics of data and metadata Quality Assurance Maturity Levels. QMM levels corresponding a) to different steps of the data production workflow and b) to the five data production phases with their standardisation characteristics and increasing degrees of formalisation.

Table 2

Shows a comparison of SMM and QMM.

SMMQMM
Software ReadinessOmitted: the data object is considered as persistent. Software development would lead to new data objects except software documentation. That is part of the metadata provenance.
MetadataCriterion: Completeness Aspect: Existence of Metadata
User DocumentationCriterion: Completeness Aspect: Existence of Metadata
Uncertainty CharacterisationCriterion: Accuracy
Public Access/Feedback/UpdateCriterion: Accessibility/Criterion: Completeness Aspect: Existence of Metadata level 5: data provenance chain exists including internal and external objects e.g. software, articles, method and workflow description/Criterion: Consistency Aspect: Versioning and Controlled Vocabularies (CVs)
UsageOmitted: we use the ISO19157 explanation of data usability. It depends on the ‘particular application’. From this point of view, an evaluation of usage is not possible.
dsj-19-1161-g2.png
Figure 2

OAIS Reference Model Information Packages on different Phases of the QMM process, showing the submission (SIP), archival (AIP), and dissemination information packages (DIP).

dsj-19-1161-g3.png
Figure 3

DKRZ Long Term Archive – example of minimum metadata (PDI, following the OAIS reference model).

Table 3

Overview of the QMM quality criteria and sub-criteria (aspects).

CriterionAspect
ConsistencyData Organisation and Data Object
Versioning and Controlled Vocabularies (CVs)
Data-Metadata Consistency
CompletenessExistence of Metadata
Existence of Data
AccessibilityMetadata Access by Identifier
Data Access by Identifier
AccuracyPlausibility
Statistical Anomalies
Table 4

QMM criterion consistency.

Level 1Level 2Level 3Level 4 R1.2Level 5
Aspect: Data Organisation and Data Object
conceptual developmentdata organisation is structured/conform to
internal rules informal documentedproject specificationwell-defined rule e.g. discipline-specific standards and long-term archive requirements (OAIS Package Info -binds)interdisciplinary standards
data objects (OAIS) are
SIPs
consistent to internal rules
SIPs
correspond to project requirements
I1, I2 AIPs
conform to well-defined rules
e.g. discipline-specific standards and long-term archive requirements
AIPs
conform to interdisciplinary standards
up-to-date and consistent to external scientific objects if feasible
DIPs are fully machine-readable with references to sources
I1 DIPs datasets are self-describing
data formats – Content Data Object (OAIS)
correspond to project requirementsI1 conform to well-defined rules
e.g. discipline-specific standards and long-term archive requirements
conform to interdisciplinary standards
data sizes are consistent
file extensions are consistent
Aspect: Versioning and Controlled Vocabularies (CVs)
conceptual developmentversioning follows/is
internal rules informal documentedsystematic corresponds to project requirementssystematic collection including documentation of enhancement conform to well-defined rules
old versions stored if feasible
In case new versions are published: documentation is consistent to previous versions
data labelled with CVs conform to
informal CVs if feasibleformal project defined CVs if feasibleI1, I2 discipline-specific standardsinterdisciplinary standards
Aspect: Data-Metadata Consistency
not evaluatedOAIS metadata components are consistent
PDI components:
Provenance- unsystematically documented:
Reference- creators
PDI components:
Provenance – basically documented:
Reference –creators
contact
Descriptive Information -naming conventions for discovery – find
and search
Complete PDI *
Provenance
Context
Reference – cross
Fixity
Access Rights
and
Representation Information
Descriptive Information
Package Info
*maintenance and storage policy are not affected, since they belong to the repository certification.I3 external metadata and data are consistent
Table 5

QMM criterion completeness.

Level 1Level 2Level 3Level 4 R1.2Level 5
Aspect: Existence of Data (Completeness and Persistence)
not evaluateddata is in production and may be deleted or overwrittendatasets exist,
not complete and
may be deleted but not overwritten unless explicitly specified
data entities (conform to discipline-specific standards)
are complete
dynamic datasets – data stream are not affected
number of datasets (aggregation) is consistent
data are persistent, as long as expiration date requires
data entities (conform to interdisciplinary standards)
are complete
dynamic datasets – data stream are not affected
number of datasets (aggregation) is consistent
data are persistent, as long as expiration date requires
Aspect: Existence of Metadata
not evaluatedOAIS metadata components exist
PDI components:
Provenance- unsystematically documented
Reference- creators
PDI components:
Provenance – basically documented:
Reference –creators
contact
Descriptive Information:
naming conventions for discovery – find
and search
F2, R1
Complete PDI *
R1.2
Provenance
Context
Reference
Fixity
Access Rights
and
Representation Information
R1.1 Descriptive Information
F4 Package Info
metadata is conform to interdisciplinary standards
data provenance chain exists including internal and external objects e.g. software, articles, method and workflow description
*maintenance and storage policy are not affected, since they belong to the repository certification.
Table 6

QMM criterion accessibility.

Level 1Level 2Level 3Level 4 R1.2Level 5
Aspect: Data Access by Identifier
not evaluateddata is accessible by
file namesinternal unique identifier correspond to project requirementspermanent identifier (expiration is documented)
(OAIS Package Info – identifies)
datasets have an expiration date and are accessible for at least 10 years (conform to rules of good scientific practice)
F1, A1 global resolvable identifier (PID-persistent identifier) registered with resolving to data access including backup
where it is commonly accepted that the identifier is persistently resolvable at least to information about fate of the object
data is accessible within other data infrastructures including cross references
checksums are correct
checksums are accessible
a bijective mapping between identifier and datasets is documented e.g. in data header (OAIS Package Info – binds, identifies)
Aspect: Metadata Access by Identifier
not evaluatedmetadata is accessible by
not specifiedinternal unique identifier correspond to project requirementsby permanent identifier expiration is documented
(F4 OAIS Package Info – identifies)
complete data citation is persistent
F1, A1 global resolvable identifier including backup
complete data citation is persistent
I3 external PID references are supported
a mapping between data access identifier and metadata access identifier is implemented (OAIS Package Info relates Content Info and PDI)
Table 7

QMM criterion accuracy.

Level 1Level 2Level 3Level 4 R1.2Level 5
Aspect: Plausibility
not evaluatedR1 documented procedure about technical sources of errors and deviation/inaccuracy exists (data header and content is consistent)
R1 documented procedure about methodological sources of errors and deviation/inaccuracy
documented procedure with validation against independent data
R1 references to evaluation results (data) and methods exist
Aspect: Statistical Anomalies
not evaluatedR1 missing values are indicated e.g. with fill values
R1 documented procedure of statistical quality control is available
scientific consistency among multiple data sets and their relationships is documented if feasible
Language: English
Submitted on: Feb 20, 2020
Accepted on: Nov 3, 2020
Published on: Nov 17, 2020
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2020 Heinke Höck, Frank Toussaint, Hannes Thiemann, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.