Have a personal or library account? Click to login
A Robust, Format-Agnostic Scientific Data Transfer Framework Cover

A Robust, Format-Agnostic Scientific Data Transfer Framework

By: James Hester  
Open Access
|Sep 2016

Figures & Tables

Table 1

Definitions of terms used in this paper.

TermDescription
Datanamea name for a concept with which one or more values can be associated
Data itema single item of information, consisting of a dataname and one or more associated data values
OntologyA collection of datanames and associated meanings, including relationships. Once ‘ologs’ have been defined (section 2.1), ‘ontology’ usually refers to an ontology expressed using an olog.
Data formatThe structures in which the data are encapsulated for transfer, for example XML or HDF5. Informal discussions often use the word ‘format’ to encompass both the file format and the ontology used to interpret the dataitems found in it. To avoid confusion, the word ‘format’ is here used to refer only to the file structure.
Data bundleA collection of data items
Dataname listThe subset of datanames from an ontology that are included in a given data bundle
Format adapterA description of how the values associated with datanames are encoded in a particular data format
Transfer specificationThe combination of a format adapter with a dataname list
dsj-15-594-g1.png
Figure 1

A simple ontology using the olog formalism.

dsj-15-594-g2.png
Figure 2

Adding a new dependency to an ontology. Adding a dependency on isotope requires definition of a new dataname denoted here by “isotopically pure experimental neutron scattering cross-section”.

dsj-15-594-g3.png
Figure 3

Using pullbacks to relate restrictive definitions to broader definitions. The types on the top left of each square are pullbacks from the types beneath and to the right. Values for datanames corresponding to identifiers and functions in the top row can be automatically derived from data described using the bottom row, and vice versa.

dsj-15-594-g4.png
Figure 4

Table 3 as a hierarchy. Six repetitions of dataname 1 values and 4 repetitions of dataname 2 values are removed, saving space.

Table 2

Datanames used in the NXmx-imgCIF translation example. The canonical names are those datanames used by the API. Intermediate datanames used during dictionary-based translation are not included. The imgCIF and NeXus columns identify equivalent locations in the respective formats, where they exist.

Canonical nameimgCIFNXmxDepends onComments
incident wavelengthdiffrn_radiation_wave-length.wavelengthNXinstrument.NXmono-chromator.wavelengthincident wavelength ID
wavelength IDdiffrn_radiation_wave-length.idFrom order of appearance in NXbeam.wavelength
scan IDdiffrn.idNXentry
frame axis location axis IDdiffrn_scan_frame-_axis.axis_idNXtransformations
frame axis location frame IDdiffrn_scan_frame-_axis.frame_idfrom order of appearance in NXtransform-ations.position
frame axis location angular positiondiffrn_scan_frame-_axis.angleNXtransformations.positionframe axis location frame ID, frame axis location axis IDNXmx actually uses an array attached to the group node. Due to limitations in the third-party NeXus access software, we have created a “position” field.
axis IDaxis.id
axis typeaxis.type
axis vectoraxis.vector
axis offsetaxis.offset
{goniometer, simple detector} axis IDNX{sample,detector}.NX-transformations“Simple” denotes uncoupled axes
{goniometer, simple detector} axis vector mcstasNX{sample,detector}.NX-transformations@vectoraxis ID
{goniometer, simple detector} axis offset mcstasNX{sample,detector}.NX-transformations@offsetaxis ID
2D data identifierarray_data.binary_id
2D data2D data identifierNot the same as simple scan data as imgCIF data may have a variety of layouts or come from submodules
simple scan dataNXinstrument.NXdetector.NXdata.datasimple scan frame frame ID
simple scan frame frame IDfrom order of appearance in NXdata.data
data axis IDNXdetector.NXdata.axesMust unpack to get values
data axis precedenceFrom ordering in NXdetector.NXdata.axesdata axis ID
Table 3

A hypothetical data table where datanames 4, 5 and 6 are each functions of the combined values of datanames 1, 2 and 3.

Dataname 1Dataname 2Dataname 3Dataname 4Dataname 5Dataname 6
AQX1.1T‘blue’
AQY2.1F‘green’
ARX1.5F‘yellow’
ARY11.2T‘green’
BQX2.1F‘yellow’
BQY3.1T‘pink’
BRX4.5F‘yellow’
BRY–1.2T‘brown’
Language: English
Submitted on: Mar 23, 2016
|
Accepted on: Sep 12, 2016
|
Published on: Sep 30, 2016
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2016 James Hester, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.