Have a personal or library account? Click to login
A Framework for Active DMPs in Photon and Neutron Science Large-Scale Facilities Cover

A Framework for Active DMPs in Photon and Neutron Science Large-Scale Facilities

Open Access
|Jan 2024

Figures & Tables

dsj-23-1522-g1.png
Figure 1

An idealized facility research lifecycle, simplified (Matthews et al. 2012).

dsj-23-1522-g2.png
Figure 2

Metadata collected and information systems supporting the stages of the Experimental lifecycle.

dsj-23-1522-g3.png
Figure 3

Increasing knowledge throughout projects’ runtime and information flow.

Table 1

DMP Phases.

DMP PHASEACTOR PROVIDING INFORMATION FOR THE DMP
0 Before proposal submissionTypically, knowledge of instrument from scientist or RDM team (static parameter).
1 Proposal submissionTypically, knowledge of the researcher, with support from the facility administration and RDM team.
2 Accepted experiment planningTypically, knowledge of the researcher, with support from the facility administration and instrument scientist.
3 Data Collection/Data processing/analysisTypically, knowledge of the user, with support from the instrument scientist.
Table 2

PMBOK®’s phases and roles in RDM information collection.

ROLEBEFORE PROJECT/OPA/EEFPROJECT INITIATIONPROJECT PLANNINGPROJECT EXECUTIONPROJECT FINALISATIONAFTER PROJECT/PA
Instrument scientistInstrument/software description, selection of applicable metadata standards, general dataset descriptionRequired project specific software, used instrumentations and their configurations, and standardsAdding/actualisation of instruments and software information
Data managerControlled vocabularies and standards administration, mapping metadata to standards, general data policies, policy executionAutomatic metadata extraction and validationOpen access of research data, validation of policy execution; actualization of standards and policies
User officeProposal information, instrument to be used, (co-) proposers
Experimental team(research)Concrete dataset description, references to additional information, metadata schema selection, estimated amount of datasets produced, dataset usage, special (own) software infrastructure requirementsExperiment execution: parameter and configurationsDataset selection, metadata completion, and validation
Experimental team (administration)Specific policies and DMP requirements for project, funding, participating researchersDMP actualizationDMP actualization after experimentsDMP actualization
Table 3

Facility wide metadata.

FACILITY INFORMATION
repositoryThe repository information comprises the name and access URL where the data is made accessible.
licenceThe license usually applied to the data in the repository.
securityInformation about e.g., backups and replicas of the data and other special security information.
pid_systemThe default PID system applied in the repository e.g., handles or Digital Object Identifiers (DOIs).
personal_dataIn case personal information in research data is treated on a facility level, e.g., no personal information is allowed in research data more than required for provenance.
min_storage_periodThe minimum period research data has to be available for good scientific practice.
archiveData archive used. If it is the same as the repository, then no URL needs be provided, as the access procedures have to be described.
certificateIf the repository is certified and with which certificate e.g., CoreTrustSeal.
arrangementsFor the data produced in a research project, an arrangement with the data repository that will receive the research data has to be made. In case a proposal in PaN facilities is approved, it normally includes the usage of the repository.
embargo_periodIn the data policy of the PaN facilities, there is also an embargo period defined.
access_controlHow the access to the data repository is controlled.
costsIn case a proposal in PaN facilities is approved, it also normally includes the costs for research data management.
Table 4

Metadata for scientific techniques related to metadata schema and file format.

techniqueDescribes a scientific technique.
      nameA name or label of the technique.
      PIDE.g., the IRI in the PaNET Ontology (Collins et al. 2021).
      metadata schemaRelated to the technique are the requirements on metadata. There can be more than one metadata schema and format used in practice.
      structure
            format
            tools
                  readingSoftware for possible usage.
                  writingPossible software for writing the format.
                  validationTools for validating the data against e.g., NeXus application definitions.1
                        validation schemaSchema complies with metadata schema above, used for validation with a tool.
Table 5

Metadata for datasets.

DATASET
namea default name
descriptiona project independent description of the dataset that can be adapted in projects
contributorcontributing persons, typically identified via ORCIDs
reproducibleif the dataset is reproducible and under which efforts
interested_communitymight be derived from disciplines
usagethere might be some default usage scenarios, like calibration; otherwise the data sets’ intended usage in the project
archivalDMP question; moment, selection_criteria, and long_term_archival_reason might be used for automated execution and validation
data_securitymeasurements and responsible person
techniquesscientific techniques used to create the dataset
filecollectionsa collection of files created by one software instance; a dataset can contain more than one filecollection
            namea default name
            resourceinstrument or laboratory used to create the filecollection; preferable identified by PID
            storagelocation and access to experimental storage
            backuplocation and access to experimental storage
            quality_assurancedescription and pointers of e.g., validation workflows
            hardwaredescription of hardware components used to create the dataset; used for data curation
            writing_softwaredescription of software and its components used to create the dataset; used for data curation
            filesfiles
                  namecan be a regular expression definition of default file names
                  formatthe format of the file (could be related to a format registry and relates to the technique table above)
                  metadata_schemathe metadata schema applied in the file (could be related to a metadata schema registry and relates to the technique table above)
                  sizeexpected minimum and maximum size of the file; average size
                  amountquantity of files; can be used together with size for estimating overall size and validation
processing_requirementshardware and software requirements for processing the data
hardware_requirements
typetype of hardware requirements like storage or processors of a certain type of computer; manufacturer and model are required
reading_softwarepossible software to use the data, including access and documentation, as well as required plugins
            name
            PID
            type
            documentation
            URL
            plugins
                  name
                  type
                  URL
Table 6

Metadata for operations.

POLICY
nameThe name of the policy.
constraintWhen the operation should be executed.
            typeEvent trigger or scheduled.
            valueTriggering event e.g. onCreation or date and time.
parametersArray of required parameters like path to a file or metadata schema for validation. The parameters are divided in input and output parameters.
operationPolicy related operation or workflow (referencing an executable workflow).
categoriesFor finding the operation, e.g., validation, integrity, format, extraction, interoperability.
descriptionA textual description about what the operation does.
Table 7

Metadata for projects.

PROJECT:
namename of the project
descriptionproject description
funding referencethe usage of a PID is advisable for later curation and integration into a graph model
membershere as well the usage of ORCIDs is advisable
start_datestart of the project
end_dateend of the project
disciplines/keywordsto retrieve RDM requirements and improve findability of the data
jurisdictionsto retrieve policy requirements; jurisdictions can be funders, national, institutional, or a laboratory/instrument
resourceinstrument or laboratory used to create data; used to retrieve possible dataset types created by the resource
dsj-23-1522-g4.png
Figure 4

Relations of the Dataset class.

dsj-23-1522-g5.png
Figure 5

Relations of the Project class.

dsj-23-1522-g6.png
Figure 6

Relationships associated with the Resource class and its associated technique.

Table 8

Pre-existing information lifecycle phases and related activities.

LIFECYCLE PHASEACTIVITIES
Before project (OPA/EEF)Retrieve initial information for central knowledge base about Datasets of Resource from repository.
General update/insert knowledge base:
  • Metadata standards

  • Formats

  • Policies

  • Mappings

Project initiationInsert new project
Relate to resource
Project planningSpecify projects datasets:
  • Create project specific datasets

  • Relate to default datasets of resource

  • Update description

Create DMP:
  • Retrieve projects datasets and related information from central knowledge base and insert into a DMP tool

  • Retrieve facility specific information

  • Update information in DMP tool

Create concrete policy execution environment:

  • Retrieve to datasets related operations into execution environment

Project executionUpdate DMP
  • Retrieve projects datasets repository and update information in DMP tool

Execute operations on datasets

Project finalizationUpdate DMP
Execute operations
Select datasets for archival
Update concrete dataset descriptions
After project (OPA)Validate pre-existing information against projects datasets in the repository:
  • Update central knowledge base

dsj-23-1522-g7.png
Figure 7

Components for aDMP system (Görzig et al. 2022).

dsj-23-1522-g8.png
Figure 8

Pre-existing information enhance and use.

Language: English
Submitted on: Dec 15, 2022
Accepted on: Dec 1, 2023
Published on: Jan 22, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Heike Görzig, Alejandra N. Gonzalez Beltran, Felix Engel, Brian Matthews, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.