Have a personal or library account? Click to login
The Challenge of Ensuring Persistency of Identifier Systems in the World of Ever-Changing Technology Cover

The Challenge of Ensuring Persistency of Identifier Systems in the World of Ever-Changing Technology

Open Access
|Apr 2017

Figures & Tables

Table 1

The extent to which the PID Service implements the four Pillars.

PillarPID Service Implementation
Identifier IndependenceCannot enforce syntax policy so Avoid organisation names – not enforced, Avoid technology references – not enforced.
Locked to a single protocol (HTTP URIs) so unable to Avoid resolution protocol indicators. Uses a User Interface that will remove characters problematic for well-known protocols.
Does Avoid visual ambiguity and use a well-known character set – implements UTF-8.
Does Define which, if any, pattern matching system they use – the regex system in use is documented.
Delivering Essential PID FunctionsDoes Issue identifiers – part of the tool’s User Interface and API.
  • Uniqueness – enforces by relationship (hierarchy) and resolution checking.

  • Ownership – recorded for every PID.

  • Editable – PID metadata editable via UI & API.

Does Store identifiers – the tool uses a database.
  • Scalability – potentially limitless (given the use of a robust, scalable database).

  • Integrity – a duty for the implementer.

  • Interpretability – documented by the data model and system documentation.

  • Versioning – automatically captured and stored.

Does Resolve identifiers – if installed as recommended with web server functionality providing access.
Separation from Data DeliveryInherent: no ability for the system to deliver data.
Employing policies for changeMostly a task for the implementers however:
  • Technology change – can be decoupled from a specific database, is loosely coupled from a front-end web server, thus certain components may easily be changed.

  • Social change – unable to be addressed by a system.

  • Identifier abandonment – identity of each identifier’s owner stored. System admin has access to all.

  • Financial sustainability – not addressed. The project was originally funded for development and some early adoption but no general funding for community development or use is yet arranged. Individual institutions have funded instances of the system in place.

  • Decommissioning – documented and comprehensive export formats (the XML shown in Figure 3) assist with this.

dsj-16-684-g1.png
Figure 1

Two simple classes for HTTP URI redirection objects: a) simple URI redirection properties of a source pattern (src), a destination URI (dest) and an HTTP Status Code (type) (Soiland-Reyes 2016a); b) URI redirection for PURLs with some ownership metadata. ‘URI’ objects are valid HTTP URIs according to RFC2616 (Fielding et al. 2016) and ‘Status Code’ objects are valid Status Codes according to the same specification.

Table 2

Examples of data for the class model in Figure 1b (Soiland-Reyes, 2016b).

idpathtypetargetcreatedlast_modifiedstatusindexed
1/example/path302http://example.com/redirectedPath2016-02-29 T13:08:112016-02-29 T14:08:11OK1
2/example/path/deeper302http://example.com/redirectedDeeper2016-02-29 T13:09:112016-02-29 T14:09:11OK1
dsj-16-684-g2.png
Figure 2

Part of the HTTP URI PIM implemented by the PID Service. The ‘URN’ object is a Universal Resource Name (Moats, 1997) and the MappingInstance ‘type’ property has a shorthand notation indicating allowed values of either ‘Regex’ or ‘1-to-1’. The Mapping to Mapping instance relationships which enable a strict Mapping hierarchy are not shown.

dsj-16-684-g3.png
Figure 3

An XML serialised instance of the PID Service’s PID PIM for a Mapping. Shown are the default actions (redirection to an HTML page) as well as pattern-based conditional redirects for a pseudo file extension (.ttl) and an HTTP Accept header set to the MIME type text/turtle, both of which redirect to Turtle RDF serialisations (W3C, 2014) of the same resource presented in the HTML default case. ‘Turtle’ is a W3C recommendation for serialialising Resource Description Framework resources.

dsj-16-684-g4.png
Figure 4

A proposed Platform Independent Model of a PID’s metadata.

dsj-16-684-g5.png
Figure 5

Object Model of the PID PIM for a Geoscience Australia IGSN identifier (igsn:10273/AU239).

dsj-16-684-g6.png
Figure 6

Object Model of the PID PIM for a Geoscience Australia HTTP URI identifier (dataset no. 69674).

Table 3

The extent to which the identifier system in Figure 5 at Geosciences Australia implements the four Pillars.

PillarPID Service Implementation
Identifier IndependenceDoes Avoid organisation names – through the implementation of a non-organisation-specific domain name. Does Avoid technology references – by governance of PID patterns.
Does Avoid resolution protocol indicators – and implements two resolution protocols. Uses a User Interface that prevents characters problematic for well-known protocols.
Does Avoid visual ambiguity and use a well-known character set – UTF-8.
Does Define which, if any, pattern matching system they use.
Delivering Essential PID FunctionsDoes Issue identifiers – the implementation’s primary task:
  • Uniqueness – enforced locally by a lower-level system (database primary key) and globally through adherence to the IGSN community procedures.

  • Ownership – recorded for every PID (fairly trivial for a single-agency implementation).

  • Editable – PID metadata editable via UI & API.

Does Store identifiers – the tool uses a corporate database:
  • Scalability – potentially limitless, using a large-scale corporate database.

  • Integrity – implemented with normal corporate database procedures.

  • Interpretability – documented by the data model and system documentation.

  • Versioning – captured and stored but only exposed to administrator users.

Does Resolve identifiers – using two protocols.
Separation from Data DeliveryInherent: no ability for the system to deliver data.
Employing policies for changeTechnology change – the system has been set up with adaption to technology change in mind, as per a corporate policy at GA. As such, the resolver mechanism and the identifier data store are loosely coupled. The identifier data store is mapped to the PIM and exportable according to that data model.
Social change – the system has an institutional owner and is thus able to handle individual staffing changes (Custodians). Publisher change catered for to some extent by participation of GA in the IGSN network meaning that copies of the identifiers and a limited version of their metadata are replicated to other organisations (such as CSIRO, two of the authors’ organisation).
Identifier abandonment – identity of each identifier’s Custodian is captured. System admin has access to all.
Financial sustainability – not addressed. The implementation assumes an on-going institutional budget both for internal system management and adherence to the IGSN community. This is a major weakness of this system.
Decommissioning – documented and comprehensive export formats (the XML shown in Figure 3) assist with this and change procedures are in place for each element of the system in accordance with corporate systems policy.
Language: English
Submitted on: Nov 16, 2016
Accepted on: Mar 10, 2017
Published on: Apr 4, 2017
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2017 Nicholas J. Car, Pavel Golodoniuc, Jens Klump, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.