Have a personal or library account? Click to login
RDFAdaptor: Efficient ETL Plugins for RDF Data Process Cover

RDFAdaptor: Efficient ETL Plugins for RDF Data Process

Open Access
|Apr 2021

Figures & Tables

Figure 1

RDFAdaptor framework.
RDFAdaptor framework.

Figure 2

Front-end interface screenshot.
Front-end interface screenshot.

Figure 3

Workflow of RDF data generation with RDFZier.
Workflow of RDF data generation with RDFZier.

Figure 4

Configuration template of RDFTranslatorAndLoader.
Configuration template of RDFTranslatorAndLoader.

Figure 5

Configuration template of SPARQLIn and SPARQLUpdate.
Configuration template of SPARQLIn and SPARQLUpdate.

Figure 6

Dum p All AGROVOC RDF Triples from SPARQL Endpoint to Local Files.
Dum p All AGROVOC RDF Triples from SPARQL Endpoint to Local Files.

Parameters defined in RDFTranslatorAndLoader_

ParameterDescription
InputSourceRDF tiples to be converted or loaded
Source Typedata source, such as local file system, Remote URL or string stream
Source RDF Formatformat of the input RDF data, fully supporting the common RDF formats
Large Input Triplesa selector for input data scale large or not, if the input is large, then the output step can not count, merge or split the triples
AdvanceBaseIRIresolve against a Base IRI if RDF data contains relative IRIs
BNodea selector for preserving BNode IDs
Verify URI syntaxa selector for URI syntax/relative URIs/language tags/datatypes check
Verify relative URIswhich returns fail log when corresponding errors occur
Verify language tags
Verify datatypes
Language tagsa selector for language tags / datatype, including fail parsing if
Datatypelanguages / datatypes are not recognised and normalizing recognised language tags / datatypes values
OutputTarget RDF FormatRDF format of the converted output
Commit or Split Sizenumber of RDF triples for the output to each RDF files or submit to stores every batch, the default value is 0, which means all the input data would be processed at one time
Local File Settingoptions of file system storage, including three selectors for “Save to File System”, “Keep Source FileName” and “Merge to Single File (take precedence over “Commit or Split Size”)”, File name and location
TripleStore Settingoptions of RDF store, including a selector for “Save to Store”, Triple Store, Server URL, Database/RepositoryID/NameSpace (identifier of database for different triple store), UserName, Password, and Graph URI.
Stream settingoption of String Stream for further data transferring, including a selector for “Save to Stream”, and Result Field

Parameters defined in SparqlUpdate_

ParameterDescription
SPARQL SettingQuery Endpoint Url From Field?checkbox, if checked means the Url of the SPARQL Query Endpoint would be coming from Kettle's previous steps and the value could get from the “Query Endpoint Url Field”
Query Endpoint Url Fieldonly used by giving a list of drop-down options of input fields when the option “Query Endpoint Url From Field” is selected
Query Endpoint UrlThe value of the Query Endpoint Url would be used when “Query Endpoint Url From Field” is unchecked
Update Endpoint Url From Field?checkbox, if checked means the Url of the SPARQL Update Endpoint would be coming from Kettle's previous steps and the value could get from the “Update Endpoint Url Field
Update Endpoint Url Fieldonly used by giving a list of drop-down options of input fields when the option “Update Endpoint Url From Field” is selected
Update Endpoint UrlThe value of the Update Endpoint Url would be used when “Update Endpoint Url From Field” is unchecked
Query From Field?checkbox, if checked means the SPARQL Update Query would be coming from Kettle's previous steps and the value could get the “Query Field Name”
Query Field Nameonly used when the option “Query From Field” is selected
Base URIresolve against a Base IRI if RDF data contains relative IRIs
SPARQL Update QueryJavaScript programming for graph update which is only used when the option “Query From Field” is disable
Output SettingResult Field Namefield specified for file saving
Http AuthHTTP UserIDuser ID of SPARQL endpoint if any
HTTP Passwordpassword of SPARQL endpoint if UserID exists

RDF data generation/translation and loading_

Data SourceData FormatNumber of RecordsNumber of mapped fieldsNumber of RDF generatedTotal Time-consuming
MongDBjson1,948,2681737,038,56332min18s
SqlServerRDB336,83151,159,68738.6s
798,38997,521,8765min4s

Parameters defined in RDFizer_

ParameterDescription
NamespacePrefixcollections of names identified by URI references
Namespacedifferent prefixes depending on the required namespaces
Mapping SettingSubject URIHTTPURI template for the Subject/Resource, a placeholder {sid} would be used and replaced by UniqueKey
Class Typesthe classes to which the resource belongs, supporting multi-class types(split by semicolon), such as skos:Concepts; foaf:Person
UniqueKeythe unique and stable primary key of resource, part of the Subject URI
Fields Mapping Parametersa list of field map from selected data source to target RDF schema, including the input Stream Field, Predicates, Object URIs, Multi-Values Sepator, Data Type, Lang Tag
Dataset MetadataMeta Subject URIURI pattern of generated dataset
Meta Class Typesthe classes to which the resource belongs
Parametersa list of descriptions of generated dataset, including PropertyType, Predicates, Object Values, DataType, Lang Tag
Output SettingFile system settingoption for file system storage, including Filename and RDF format
RDF store settingoption for RDF store, including triple store name, server URL, Repository ID, Username (if any), Password, Graph URI

Parameters defined in SparqlIn_

ParameterDescription
SPARQL SettingAccept URL from fieldcheckbox, if checked means the Url of the SPARQL Endpoint would be coming from Kettle's previous steps and the value could get from the “URL field name”
URL field nameonly used by giving a list of drop-down options of input fields when the option “Accept URL from field” is selected
SPARQL Endpoint URLendpoint Url queried when “Query Endpoint Url From Field” is disabled
Query Typequery type which provides two options: Graph query or Tuple query
SPARQL QuerySPARQL query forms: SELECT or CONSTRUCT
Limitlimitation on data size to be processed if necessary
Offsetthe starting position of data processing
Output SettingResult Field Namefield specified for file saving
RDF Formattarget local data format, either JSON, XML, CSV or TSV for SELECT query, RDF format only for CONSTRUCT query
Max Rowsdefinition of the maximum size of the output file, empty of 0 means get all the triples
Http AuthHTTP UserIDuser ID of SPARQL endpoint if any
HTTP Passwordpassword of SPARQL endpoint if UserID exists
DOI: https://doi.org/10.2478/jdis-2021-0020 | Journal eISSN: 2543-683X | Journal ISSN: 2096-157X
Language: English
Page range: 123 - 145
Submitted on: Dec 18, 2020
Accepted on: Mar 9, 2021
Published on: Apr 14, 2021
Published by: Chinese Academy of Sciences, National Science Library
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2021 Jiao Li, Guojian Xian, Ruixue Zhao, Yongwen Huang, Yuantao Kou, Tingting Luo, Tan Sun, published by Chinese Academy of Sciences, National Science Library
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.