
Figure 1
Tracking interdisciplinary samples throughout the cycle of field collection, transport to collaborators and other labs, various analyses, and digital records.
Table 1
Examples of PIDs that have been used for samples, modified from Guralnick et al, (2015).
| IDENTIFIER TYPE | IDENTIFIER EXAMPLE | SCOPE |
|---|---|---|
| ARK | ark:/12148/btv1b8449691v | Flexible |
| URN | urn:catalog:UMMZ:Mammals:171041 | Flexible |
| HTTP URI | http://data.rbge.org.uk/herb/E00115694 | Flexible |
| DOI | 10.7299/X7VQ32SJ | Flexible, mostly papers and datasets |
| UUID | EF0A4D3E-702F-4882-81B8- CA737AEB7B28 | Flexible |
| IGSN | IGSN: IECUR0002 | Geoscience, working to become general physical sample identifier |
| CETAF URI, based on HTTP URI | http://data.rbge.org.uk/herb/E00421503 | Species Occurrence, Specimens from CETAF institutions |
| RRID | RRID:MGI:5630441 | Biomedical Research Resources |
| BioSample accession number | SAMN03983893 | Biological source materials used in experimental assays |
[i] Acronyms: ARK = Archival Resource Keys, URN = Uniform Resource Name, URI = Uniform Resource Identifier, DOI = Digital Object Identifier, UUID = Universally Unique Identifier, IGSN = International GeoSample Number, CETAF = Consortium of the European Taxonomic Facilities, RRID = Research Resource Identifier.

Figure 2
Sample journey map, using the sample PID and metadata to document sample history and link related samples in the WHONDRS project (Stegen and Goldman 2018; Toyoda et al, 2020).
PNNL = Pacific Northwest National Laboratory; EMSL = Environmental Molecular Sciences Laboratory; ORNL = Oak Ridge National Laboratory; GOLD = Genomes Online Database.

Figure 3
Options for assigning IDs to sets or chains of highly related samples and subsamples. There is uncertainty among domain scientists about whether to assign new PIDs to subsamples. Based on our pilot test feedback, options 2 and 3 are most efficient for soil cores and water samples, respectively. Relationship metadata can be inferred from the type of ID (e.g. collection or site ID) and the order of Parent IGSNs, and assists machine reconstruction of the sampling hierarchy from original feature or sample through subsequent child samples.
Table 2
Mapping of key fields to promote interoperability between geoscience (IGSN) and associated metagenomic samples (BioSample). Minimum Information about Any Sequence (MIxS)/Minimum Information about any Metagenomic Sequence (MIMS) templates require or encourage use of the Environment Ontology (ENVO) to describe environmental context and materials, and the GAZETTEER ontology (GAZ) for place names.
| IGSN FIELD | MIXS/MIMS FIELD |
|---|---|
| IGSN | Source material ID (can include the full link to sample landing page) |
| Material | Environmental medium* = ENVO |
| Related to Material | organism (e.g. soil metagenome) |
| Physiographic feature | local scale environmental context* = ENVO |
| N/A | broad scale environmental context* = ENVO |
| Country | geographic location (country or region) = GAZ |
| N/A | sample material processing |

Figure 4
Example of using related identifiers to link related samples and information. Related identifiers are listed in blue. All metadata can be provided at the sample level or by providing separate files (depicted as boxes) for higher-level collections of samples, sampling events, methods, and/or locations. When providing separate spreadsheet files, each file (e.g. locations file) contains a row for each unique related identifier (e.g. location ID), with the associated metadata fields (e.g. location description) as columns. Unique identifiers for these related, higher-level entities then allow associating relevant metadata (e.g. latitude and longitude) with individual samples. This practice is flexible and optional, depending on data management needs and preferences.
Table 3
Summary of preliminary issues and solutions encountered in assigning SESAR IGSN metadata to sample locations. While the most basic location information is included (e.g. latitude, longitude, and location description), our community needs more work on interoperability with standards that more fully describe site locations, such as metadata standards developed by the Open Geospatial Consortium. Location descriptions in multidisciplinary ecosystem sciences include location descriptions for samples and other entities, such as sensor infrastructure in monitoring networks and remote sensing data.
| Location ID | If there is a project-specific site/location name, you must currently provide this in the free-text location description field. We therefore added LocationID as a field, which can be associated with metadata and does not need to be globally unique. Sample metadata contains location fields, but is not intended to fully describe sites/location information. |
| Location Hierarchies | We do not address a standard way to represent complex location hierarchies (e.g. basins, watersheds, wells, depths within wells), which is needed but is out of scope for the current effort. |
| Plot Name | Many projects are located in remote areas where GPS coordinates are not reliable and yet specific locations are necessary. Therefore, plots are formally defined and distance from specific points documented in the field using a relative reference system. Currently, users must describe this within the Location Description metadata field. |
| Uncertainty or precision of geographic coordinates | We could add a metadata field to provide detail on the uncertainty in the geographic coordinates, as done in DarwinCore. However, we found that participants sometimes do not have this information. Certain instruments (i.e. smart phones) do not provide an easy way to specify uncertainty. It may therefore be more efficient to simply indicate the specific instrument used to provide information on the likely uncertainty or precision of the coordinates. Additional terms are needed to specify instrument used. |
| Sampling feature/well type | There are no controlled vocabularies within the current IGSN template to characterize the type of well. We currently recommend providing this information in the free-text location description. |

Figure 5
Sample metadata for Environmental Systems Sciences (IGSN-ESS). Each sample metadata element is listed under a general category of information. Required fields are marked with an asterisk*. Fields added to IGSN metadata or revised from Darwin Core (DwC), MIxS, Environment Ontology (ENVO), Biological Collections Ontology (BCO), Plant Ontology (PO) are indicated in parentheses.
