Have a personal or library account? Click to login
Improving Discovery and Use of NASA’s Earth Observation Data Through Metadata Quality Assessments Cover

Improving Discovery and Use of NASA’s Earth Observation Data Through Metadata Quality Assessments

Open Access
|Apr 2021

Figures & Tables

dsj-20-1311-g1.png
Figure 1

A conceptual model of the metadata quality assessment process within a data system. A data system is made up of discipline-specific centers that contribute metadata to a centralized global catalog. To conduct assessments, an independent quality team systematically reviews metadata within the global catalog and reports findings to the discipline-specific data centers. The discipline-specific data center curators update the metadata and resubmit it to the global catalog, improving the quality. The discipline-specific data centers, the internal metadata quality team and the independent quality team work together to improve the metadata standards and content. For NASA, EOSDIS is the data system, the DAACs are the discipline-specific data centers, EED2 is the internal metadata quality team and ARC is the independent metadata quality team.

dsj-20-1311-g2.png
Figure 2

The ARC metadata assessment process.

Table 1

Select automated and manual checks performed by the ARC team during the assessment process.

AUTOMATED CHECKSMANUAL CHECKS
Data Identification
  • Data are identified by a functioning unique identifier (e.g. DOI).

  • The responsible data center is identified using a controlled keyword list.

  • The title is human readable and representative of the dataset.

  • The abstract accurately describes the data.

  • Key journal publications describing the data are included.

Descriptive Keywords
  • Descriptive science keywords conform to GCMD conventions and/or ISO 19115 topic categories.

  • The science keywords accurately describe the data to which they are applied.

URLs
  • URLs are responsive and do not redirect.

  • FTP protocol is not utilized.

  • Data access URLs point as directly to the data as possible.

  • Only links to relevant online resources are included.

Acquisition Information
  • Earth observation platform and instrument names conform to GCMD conventions.

  • Reported data collection was during a time when the acquiring instrument was active.

Table 2

The ARC team’s assessment priority matrix. A priority matrix is documented for each metadata concept and identifies the criteria that indicate whether a finding should be flagged as high, medium or low priority.

PRIORITY CATEGORIZATIONJUSTIFICATION
Red = High Priority FindingsEmphasizes metadata completeness, accuracy and data accessibility. Metadata that fails to meet CMR requirements or that are factually incorrect constitute a high priority finding.
Examples:
  • Broken or missing data access URL

  • Non-compliance to controlled vocabulary


Metadata fields flagged as red are required to be addressed by the data center.
Yellow = Medium Priority FindingsEmphasizes metadata completeness and consistency - recommendations focus on ways to help improve data discoverability and usability that go beyond CMR requirements.
Examples:
  • A URL is missing a description. While not required, descriptions provide important context for the URL.

  • The same resource is labelled differently between metadata records


Data centers are highly encouraged to address yellow findings and are encouraged to provide a rationale for unaddressed items.
Blue = Low Priority FindingsDocuments minor metadata consistency, completeness and accuracy issues.
Examples:
  • URLs that need to be updated from the ‘http’ to ‘https’ protocol

  • A DOI is provided but the DOI Authority is not specified


Addressing blue findings are optional and up to the discretion of the data center.
Green = No Findings/IssuesMetadata elements flagged green are free of issues and require no action on behalf of the data center.
dsj-20-1311-g3.png
Figure 3

The five collection level metadata concepts that received the most high priority recommendations from the ARC team. Since URLs appear in multiple UMM metadata elements, the number of reported findings shown is more than the number of records reviewed.

dsj-20-1311-g4.png
Figure 4

The cumulative number of findings in the high (red), medium (yellow) and low (blue) categories for the nine data centers upon initial assessment (left) and after reassessment (right). The percent improvement in the number of findings is shown above the right three columns.

Language: English
Submitted on: Dec 29, 2020
Accepted on: Apr 11, 2021
Published on: Apr 28, 2021
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2021 Kaylin Bugbee, Jeanné le Roux, Adam Sisco, Aaron Kaulfus, Patrick Staton, Camille Woods, Valerie Dixon, Christopher Lynnes, Rahul Ramachandran, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.