A Robust, Format-Agnostic Scientific Data Transfer Framework

James Hester

doi:10.5334/dsj-2016-012

Figures & Tables

Table 1

Definitions of terms used in this paper.

Term	Description
Dataname	a name for a concept with which one or more values can be associated
Data item	a single item of information, consisting of a dataname and one or more associated data values
Ontology	A collection of datanames and associated meanings, including relationships. Once ‘ologs’ have been defined (section 2.1), ‘ontology’ usually refers to an ontology expressed using an olog.
Data format	The structures in which the data are encapsulated for transfer, for example XML or HDF5. Informal discussions often use the word ‘format’ to encompass both the file format and the ontology used to interpret the dataitems found in it. To avoid confusion, the word ‘format’ is here used to refer only to the file structure.
Data bundle	A collection of data items
Dataname list	The subset of datanames from an ontology that are included in a given data bundle
Format adapter	A description of how the values associated with datanames are encoded in a particular data format
Transfer specification	The combination of a format adapter with a dataname list

A simple ontology using the olog formalism.

Adding a new dependency to an ontology. Adding a dependency on isotope requires definition of a new dataname denoted here by “isotopically pure experimental neutron scattering cross-section”.

Using pullbacks to relate restrictive definitions to broader definitions. The types on the top left of each square are pullbacks from the types beneath and to the right. Values for datanames corresponding to identifiers and functions in the top row can be automatically derived from data described using the bottom row, and *vice versa*.

Table 3 as a hierarchy. Six repetitions of dataname 1 values and 4 repetitions of dataname 2 values are removed, saving space.

Table 2

Datanames used in the NXmx-imgCIF translation example. The canonical names are those datanames used by the API. Intermediate datanames used during dictionary-based translation are not included. The imgCIF and NeXus columns identify equivalent locations in the respective formats, where they exist.

Canonical name	imgCIF	NXmx	Depends on	Comments
incident wavelength	diffrn_radiation_wave-length.wavelength	NXinstrument.NXmono-chromator.wavelength	incident wavelength ID
wavelength ID	diffrn_radiation_wave-length.id	From order of appearance in NXbeam.wavelength
scan ID	diffrn.id	NXentry
frame axis location axis ID	diffrn_scan_frame-_axis.axis_id	NXtransformations
frame axis location frame ID	diffrn_scan_frame-_axis.frame_id	from order of appearance in NXtransform-ations.position
frame axis location angular position	diffrn_scan_frame-_axis.angle	NXtransformations.position	frame axis location frame ID, frame axis location axis ID	NXmx actually uses an array attached to the group node. Due to limitations in the third-party NeXus access software, we have created a “position” field.
axis ID	axis.id
axis type	axis.type
axis vector	axis.vector
axis offset	axis.offset
{goniometer, simple detector} axis ID		NX{sample,detector}.NX-transformations		“Simple” denotes uncoupled axes
{goniometer, simple detector} axis vector mcstas		NX{sample,detector}.NX-transformations@vector	axis ID
{goniometer, simple detector} axis offset mcstas		NX{sample,detector}.NX-transformations@offset	axis ID
2D data identifier	array_data.binary_id
2D data			2D data identifier	Not the same as simple scan data as imgCIF data may have a variety of layouts or come from submodules
simple scan data		NXinstrument.NXdetector.NXdata.data	simple scan frame frame ID
simple scan frame frame ID		from order of appearance in NXdata.data
data axis ID		NXdetector.NXdata.axes		Must unpack to get values
data axis precedence		From ordering in NXdetector.NXdata.axes	data axis ID

Table 3

A hypothetical data table where datanames 4, 5 and 6 are each functions of the combined values of datanames 1, 2 and 3.

Dataname 1	Dataname 2	Dataname 3	Dataname 4	Dataname 5	Dataname 6
A	Q	X	1.1	T	‘blue’
A	Q	Y	2.1	F	‘green’
A	R	X	1.5	F	‘yellow’
A	R	Y	11.2	T	‘green’
B	Q	X	2.1	F	‘yellow’
B	Q	Y	3.1	T	‘pink’
B	R	X	4.5	F	‘yellow’
B	R	Y	–1.2	T	‘brown’

A Robust, Format-Agnostic Scientific Data Transfer Framework

Figures & Tables

Table 1

Figure 1

Figure 2

Figure 3

Figure 4

Table 2

Table 3

Paradigm

My account