Have a personal or library account? Click to login

Figures & Tables

dsj-20-1299-g1.png
Figure 1

Tracking interdisciplinary samples throughout the cycle of field collection, transport to collaborators and other labs, various analyses, and digital records.

Table 1

Examples of PIDs that have been used for samples, modified from Guralnick et al, (2015).

IDENTIFIER TYPEIDENTIFIER EXAMPLESCOPE
ARKark:/12148/btv1b8449691vFlexible
URNurn:catalog:UMMZ:Mammals:171041Flexible
HTTP URIhttp://data.rbge.org.uk/herb/E00115694Flexible
DOI10.7299/X7VQ32SJFlexible, mostly papers and datasets
UUIDEF0A4D3E-702F-4882-81B8- CA737AEB7B28Flexible
IGSNIGSN: IECUR0002Geoscience, working to become general physical sample identifier
CETAF URI, based on HTTP URIhttp://data.rbge.org.uk/herb/E00421503Species Occurrence, Specimens from CETAF institutions
RRIDRRID:MGI:5630441Biomedical Research Resources
BioSample accession numberSAMN03983893Biological source materials used in experimental assays

[i] Acronyms: ARK = Archival Resource Keys, URN = Uniform Resource Name, URI = Uniform Resource Identifier, DOI = Digital Object Identifier, UUID = Universally Unique Identifier, IGSN = International GeoSample Number, CETAF = Consortium of the European Taxonomic Facilities, RRID = Research Resource Identifier.

dsj-20-1299-g2.png
Figure 2

Sample journey map, using the sample PID and metadata to document sample history and link related samples in the WHONDRS project (Stegen and Goldman 2018; Toyoda et al, 2020).

PNNL = Pacific Northwest National Laboratory; EMSL = Environmental Molecular Sciences Laboratory; ORNL = Oak Ridge National Laboratory; GOLD = Genomes Online Database.

dsj-20-1299-g3.png
Figure 3

Options for assigning IDs to sets or chains of highly related samples and subsamples. There is uncertainty among domain scientists about whether to assign new PIDs to subsamples. Based on our pilot test feedback, options 2 and 3 are most efficient for soil cores and water samples, respectively. Relationship metadata can be inferred from the type of ID (e.g. collection or site ID) and the order of Parent IGSNs, and assists machine reconstruction of the sampling hierarchy from original feature or sample through subsequent child samples.

Table 2

Mapping of key fields to promote interoperability between geoscience (IGSN) and associated metagenomic samples (BioSample). Minimum Information about Any Sequence (MIxS)/Minimum Information about any Metagenomic Sequence (MIMS) templates require or encourage use of the Environment Ontology (ENVO) to describe environmental context and materials, and the GAZETTEER ontology (GAZ) for place names.

IGSN FIELDMIXS/MIMS FIELD
IGSNSource material ID (can include the full link to sample landing page)
MaterialEnvironmental medium* = ENVO
Related to Materialorganism (e.g. soil metagenome)
Physiographic featurelocal scale environmental context* = ENVO
N/Abroad scale environmental context* = ENVO
Countrygeographic location (country or region) = GAZ
N/Asample material processing
dsj-20-1299-g4.png
Figure 4

Example of using related identifiers to link related samples and information. Related identifiers are listed in blue. All metadata can be provided at the sample level or by providing separate files (depicted as boxes) for higher-level collections of samples, sampling events, methods, and/or locations. When providing separate spreadsheet files, each file (e.g. locations file) contains a row for each unique related identifier (e.g. location ID), with the associated metadata fields (e.g. location description) as columns. Unique identifiers for these related, higher-level entities then allow associating relevant metadata (e.g. latitude and longitude) with individual samples. This practice is flexible and optional, depending on data management needs and preferences.

Table 3

Summary of preliminary issues and solutions encountered in assigning SESAR IGSN metadata to sample locations. While the most basic location information is included (e.g. latitude, longitude, and location description), our community needs more work on interoperability with standards that more fully describe site locations, such as metadata standards developed by the Open Geospatial Consortium. Location descriptions in multidisciplinary ecosystem sciences include location descriptions for samples and other entities, such as sensor infrastructure in monitoring networks and remote sensing data.

Location IDIf there is a project-specific site/location name, you must currently provide this in the free-text location description field. We therefore added LocationID as a field, which can be associated with metadata and does not need to be globally unique. Sample metadata contains location fields, but is not intended to fully describe sites/location information.
Location HierarchiesWe do not address a standard way to represent complex location hierarchies (e.g. basins, watersheds, wells, depths within wells), which is needed but is out of scope for the current effort.
Plot NameMany projects are located in remote areas where GPS coordinates are not reliable and yet specific locations are necessary. Therefore, plots are formally defined and distance from specific points documented in the field using a relative reference system. Currently, users must describe this within the Location Description metadata field.
Uncertainty or precision of geographic coordinatesWe could add a metadata field to provide detail on the uncertainty in the geographic coordinates, as done in DarwinCore. However, we found that participants sometimes do not have this information. Certain instruments (i.e. smart phones) do not provide an easy way to specify uncertainty. It may therefore be more efficient to simply indicate the specific instrument used to provide information on the likely uncertainty or precision of the coordinates. Additional terms are needed to specify instrument used.
Sampling feature/well typeThere are no controlled vocabularies within the current IGSN template to characterize the type of well. We currently recommend providing this information in the free-text location description.
dsj-20-1299-g5.png
Figure 5

Sample metadata for Environmental Systems Sciences (IGSN-ESS). Each sample metadata element is listed under a general category of information. Required fields are marked with an asterisk*. Fields added to IGSN metadata or revised from Darwin Core (DwC), MIxS, Environment Ontology (ENVO), Biological Collections Ontology (BCO), Plant Ontology (PO) are indicated in parentheses.

Language: English
Submitted on: Dec 5, 2020
Accepted on: Feb 12, 2021
Published on: Mar 18, 2021
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2021 Joan E. Damerow, Charuleka Varadharajan, Kristin Boye, Eoin L. Brodie, Madison Burrus, K. Dana Chadwick, Robert Crystal-Ornelas, Hesham Elbashandy, Ricardo J. Eloy Alves, Kim S. Ely, Amy E. Goldman, Ted Haberman, Valerie Hendrix, Zarine Kakalia, Kenneth M. Kemner, Annie B. Kersting, Nancy Merino, Fianna O'Brien, Zach Perzan, Emily Robles, Patrick Sorensen, James C. Stegen, Ramona L. Walls, Pamela Weisenhorn, Mavrik Zavarin, Deborah Agarwal, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.