Have a personal or library account? Click to login
Are Scientific Data Repositories Coping with Research Data Publishing? Cover

Are Scientific Data Repositories Coping with Research Data Publishing?

Open Access
|Apr 2016

Figures & Tables

Table 1

Scientific Data Repositories studied.

TypeFoundedCountrySoftwareCert.
3TU.DatacentrumInstitution2008NLDIn-housea
CSIRO DAPInstitution2011AUSIn-house
DryadOrganization2008USADSpace
FigshareCompany2011GBRIn-house
ZenodoOrganization2013CHEInvenio

[i] aData Seal of Approval

Table 2

Datasets published by Scientific Data Repositories.

up to 201020112012201320142015Total
3TU.Datacentrum16924463793453712963529
CSIRO DAP046624384544181418
Dryad49377313091990268724249676
Figshare016,92928,224108,22194,22372,818320,415
Zenodo9924684326811071609
Total228418,21830,042111,03798,00377,063336,647
Table 3

Files published by Scientific Data Repositories.

TotalAverageMinimumMaximum
3TU.Datacentrum3,8331118
CSIRO DAP549,514388025,804
Dryad30,8143196
Figshare320,92210443
Zenodo11,791713,963
Total916,874
Table 4

Top 10 subjects associated with published dataset. The following subjects are abridged to reduce the table size: ‘stream water t.’ is ‘stream water temperature’; ‘000 comp. sci.’ is ‘000 Computer science, knowledge & systems’; ‘apsp’ is ‘APSP, all-pairs shortest paths’; ‘stp’ is ‘STP, simple temporal problem’; ‘stn’ is ‘STN, simple temporal network’; ‘hydra. eng.’ is ‘hydraulic engineering’; ‘comp. bin.…’ is ‘compact binaries and/or black-holes’; ‘interstellar…’ is ‘interstellar medium in and around the Milky Way’; ‘pop. gen. – empirical’ is ‘population genetics – empirical’; ‘ecological gen.’ is ‘Ecological Genetics’; ‘cons. gen.’ is ‘Conservation Genetics’; ‘cell biol.’ is ‘Cell Biology’; ‘prog. esta.’ is ‘Programas estadìsticos’; ‘3D doc.’ is ‘3D Documentation’; ‘Amer. South.’ is ‘American Southeast’; ‘web craw.’ is ‘web crawling’. Some of the CSIRO subjects are classification codes coming from the Australian and New Zealand Standard Research Classification (ANZSRC), e.g., 020199 Astronomical and Space Sciences not elsewhere classified.‘dts’ stands for ‘Distributed Temperature Sensing’.

Subj.3TU.DatacentrumCSIRODryadFigshareZenodo
#1n/a (3141 – 67.2%)020199 (690 – 6.6%)n/a (606 – 1.2%)biological sciences (155,196 – 23%)n/a (445 – 10.1%)
#2dts (37 – 0.8%)pulsars (423 – 4.1%)adaptation (529 – 1.1%)medicine (60,866 – 9.3%)matriz de datos (234 – 5.3%)
#3stream water t.(37 – 0.8%)neutron stars (416 – 4%)pop. gen. – emp. (429 – 0.9%)genetics (33,080 – 5.1%)prog. esta. (231 – 5.2%)
#4530 physics (23 – 0.5%)comp. bin.…(247 – 2.4%)speciation (326 – 0.7%)biotechnology (28,470 – 4.4%)PowerTAC (138 – 3.1%)
#5000 comp. sci.(21 – 0.4%)Australia (229 – 2.2%)ecological gen. (281 – 0.6%)ecology (23,398 – 3.6%)3D (90 – 2%)
#6apsp (19 – 0.4%)interstellar…(186 – 1.8%)phylogeography (263 – 0.5%)biochemistry (21,638 – 3.3%)archaeology (89 – 2%)
#7shortest path (19 – 0.4%)adaptation (126 – 1.2%)hybridization (222 – 0.5%)infectious diseases (21,561 – 3.3%)3D doc. (88 – 2%)
#8stp (19 – 0.4%)climate change (117 –1.1 %)insects (219 – 0.4%)science policy (21,324 – 3.3%)caddo (88 – 2%)
#9stn (19 – 0.4%)pulsar (112 – 1%)cons. genet. (217 – 0.5%)uncategorized (21,079 – 3.2%)Amer. South. (88 – 2%)
#10hydra. eng. (19 – 0.4%)alien plant (108 – 1%)fish (167 – 0.4%)cell biol. (18,780 – 2.9%)web craw. (38 – 0.9%)
Distinct740194019,8292991977
Table 5

Top 10 formats associated with published datasets. Format is mime type – file extension.

Format3TU.DatacentrumCSIRODryadFigshareZenodo
#1app./x-netcdf (3070 – 80%)app./fits – sf (143,376 – 26%)text/plain – txt (4926 – 16%)n/a – xls (267,222 – 83.3%)n/a – sav (1798 – 15.2%)
#2app./zip (559 – 14.6%)image/png – png (95,072 – 17.3%)Excel 2007 – xlsx (3793 – 12.3%)n/a – pdf (16,996 – 5.3%)n/a – txt (1243 – 10.5%)
#3text/plain (57 – 1.5%)app./fits – rf (94,028 – 17.1%)text/csv – csv (3099 – 10%)n/a (12,968 – 4%)n/a – png (1059 – 9%)
#4app./octet-stream (27 – 0.7%)app./fits – FTp (92,888 – 16.9%)app./zip – zip (2834 – 9.2%)n/a – docx (4868 – 1.5%)n/a – fits (1043 – 8.8%)
#5app./x-hdf5 (22 – 0.6%)app./fits – cf (82,204 – 14.9%)Excel – xls (2074 – 6.7%)n/a – doc (4511 – 1.4%)n/a – zip (878 – 7.4%)
#6app./x-gzip (19 – 0.5%)n/a – adf (9833 – 1.8%)n/an/a (2007 – 6.5%)n/a – xlsx (4012 – 1.2%)n/a – gz (616 – 5.2%)
#7video/x-msvideo (10 – 0.3%)n/a – dat (4920 – 0.9%)text/plain – nex (1191 – 3.9%)app./zip – zip (1988 – 0.6%)n/a – csv (532 – 4.5%)
#8video/mpeg (9 – 0.2%)n/a – nit (4911 – 0.9%)app./pdf – pdf (1097 – 3.6%)n/a – csv (1422 – 0.4%)n/a – csv (269 – 2.3%)
#9app./x-gzip (8 – 0.2%)image/tiff – tif (3926 – 0.7%)app./x-gzip – gz (734 – 2.4%)n/a – jpg (1395 – 0.4%)n/a – itp (260 – 2.2%)
#10app./zip (4 – 0.1%)n/a – 001 (1637 – 0.3%)app./x-fasta – fasta (728 – 2.4%)n/a – cif (1108 – 0.3%)n/a – ods (205 – 1.7%)
Distinct531876868524961
Table 6

Dataset attributes supported by Scientific Data Repositories.

3TU.Dat.CSIRODryadFigshareZenodo
Availability
Bibliometric data
Coverage
Date
Format
License
Minimal description
Paper reference
Project
Provenance
Subjects
Table 7

Top 5 licences associated with published dataset. ‘(c) CiTG Delft’ is ‘Delft University of Technology, Civil Engineering and Geosciences’; ‘Delft, KWR’ is ‘Delft University of Technology, KWR Watercycle Research Institute’; ‘openAccess’ is ‘info:eurepo/semantics/openAccess’; ‘closedAccess’ is ‘info:eu-repo/semantics/closedAccess’.

Licence3TU.Dat.CSIRODryadFigshareZenodo
#1n/a (3453 – 97.85%)CC-BY 3.0 (870 – 61.35%)CC0 1.0 (29,025 – 94.19%)n/a (308,108 – 96.16%)CC0 1.0 (1041 – 64.7%)
#2CC BY-SA 3.0 (22 – 0.62%)CSIRO Data Licence (328 – 23.13%)n/a (1745 – 5.66%)CC-BY (12,262 – 3.83%)CC BY 4.0 (251 – 15.6%)
#3(c) CITG Delft (18 – 0.51%)CC-BY 4.0 (83 – 5.85%)Unknown (8 – 0.02%)CC0 (41 – 0.01%)openAccess (175 – 10.88%)
#4Delft, KWR (16 – 0.45%)No Licence (47 – 3.31%)Custom (1 – n/a)Apache-2.0 (2 – n/a)CC BY-SA 4.0 (72 – 4.47%)
#5Public (12 – 0.34%)CC BY-NC-ND 3.0 (45 – 3.17%)Custom (1 – n/a)GPL-3.0 (1 – n/a)closedAccess (50 – 3.11%)
Distinct10103967
Table 8

Dataset discovery facilities. The CSIRO OAI-PMH facility is actually offered via the Research Data Australia service.

End-user Facilities
3TU.Dat.CSIRODryadFigshareZenodo
Keyword-based
Field-based
Browse
Other
Web-based API and Protocols
3TU.Dat.CSIRODryadFigshareZenodo
HarvestingOAI-PMHOAI-PMHOAI-PMHIn-houseOAI-PMH
Searchn/aIn-housen/aIn-housen/a
Table 9

Dataset citation practices supported by Scientific Data Repositories.

3TU.Dat.CSIRODryadFigshareZenodo
Citation string
Export option
Embed option
Share option
Language: English
Published on: Apr 26, 2016
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2016 Massimiliano Assante, Leonardo Candela, Donatella Castelli, Alice Tani, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.