Table 1
Scientific Data Repositories studied.
| Type | Founded | Country | Software | Cert. | |
|---|---|---|---|---|---|
| 3TU.Datacentrum | Institution | 2008 | NLD | In-house | ✓a |
| CSIRO DAP | Institution | 2011 | AUS | In-house | |
| Dryad | Organization | 2008 | USA | DSpace | |
| Figshare | Company | 2011 | GBR | In-house | |
| Zenodo | Organization | 2013 | CHE | Invenio |
[i] aData Seal of Approval
Table 2
Datasets published by Scientific Data Repositories.
| up to 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | Total | |
|---|---|---|---|---|---|---|---|
| 3TU.Datacentrum | 1692 | 446 | 379 | 345 | 371 | 296 | 3529 |
| CSIRO DAP | 0 | 46 | 62 | 438 | 454 | 418 | 1418 |
| Dryad | 493 | 773 | 1309 | 1990 | 2687 | 2424 | 9676 |
| Figshare | 0 | 16,929 | 28,224 | 108,221 | 94,223 | 72,818 | 320,415 |
| Zenodo | 99 | 24 | 68 | 43 | 268 | 1107 | 1609 |
| Total | 2284 | 18,218 | 30,042 | 111,037 | 98,003 | 77,063 | 336,647 |
Table 3
Files published by Scientific Data Repositories.
| Total | Average | Minimum | Maximum | |
|---|---|---|---|---|
| 3TU.Datacentrum | 3,833 | 1 | 1 | 18 |
| CSIRO DAP | 549,514 | 388 | 0 | 25,804 |
| Dryad | 30,814 | 3 | 1 | 96 |
| Figshare | 320,922 | 1 | 0 | 443 |
| Zenodo | 11,791 | 7 | 1 | 3,963 |
| Total | 916,874 |
Table 4
Top 10 subjects associated with published dataset. The following subjects are abridged to reduce the table size: ‘stream water t.’ is ‘stream water temperature’; ‘000 comp. sci.’ is ‘000 Computer science, knowledge & systems’; ‘apsp’ is ‘APSP, all-pairs shortest paths’; ‘stp’ is ‘STP, simple temporal problem’; ‘stn’ is ‘STN, simple temporal network’; ‘hydra. eng.’ is ‘hydraulic engineering’; ‘comp. bin.…’ is ‘compact binaries and/or black-holes’; ‘interstellar…’ is ‘interstellar medium in and around the Milky Way’; ‘pop. gen. – empirical’ is ‘population genetics – empirical’; ‘ecological gen.’ is ‘Ecological Genetics’; ‘cons. gen.’ is ‘Conservation Genetics’; ‘cell biol.’ is ‘Cell Biology’; ‘prog. esta.’ is ‘Programas estadìsticos’; ‘3D doc.’ is ‘3D Documentation’; ‘Amer. South.’ is ‘American Southeast’; ‘web craw.’ is ‘web crawling’. Some of the CSIRO subjects are classification codes coming from the Australian and New Zealand Standard Research Classification (ANZSRC), e.g., 020199 Astronomical and Space Sciences not elsewhere classified.‘dts’ stands for ‘Distributed Temperature Sensing’.
| Subj. | 3TU.Datacentrum | CSIRO | Dryad | Figshare | Zenodo |
|---|---|---|---|---|---|
| #1 | n/a (3141 – 67.2%) | 020199 (690 – 6.6%) | n/a (606 – 1.2%) | biological sciences (155,196 – 23%) | n/a (445 – 10.1%) |
| #2 | dts (37 – 0.8%) | pulsars (423 – 4.1%) | adaptation (529 – 1.1%) | medicine (60,866 – 9.3%) | matriz de datos (234 – 5.3%) |
| #3 | stream water t.(37 – 0.8%) | neutron stars (416 – 4%) | pop. gen. – emp. (429 – 0.9%) | genetics (33,080 – 5.1%) | prog. esta. (231 – 5.2%) |
| #4 | 530 physics (23 – 0.5%) | comp. bin.…(247 – 2.4%) | speciation (326 – 0.7%) | biotechnology (28,470 – 4.4%) | PowerTAC (138 – 3.1%) |
| #5 | 000 comp. sci.(21 – 0.4%) | Australia (229 – 2.2%) | ecological gen. (281 – 0.6%) | ecology (23,398 – 3.6%) | 3D (90 – 2%) |
| #6 | apsp (19 – 0.4%) | interstellar…(186 – 1.8%) | phylogeography (263 – 0.5%) | biochemistry (21,638 – 3.3%) | archaeology (89 – 2%) |
| #7 | shortest path (19 – 0.4%) | adaptation (126 – 1.2%) | hybridization (222 – 0.5%) | infectious diseases (21,561 – 3.3%) | 3D doc. (88 – 2%) |
| #8 | stp (19 – 0.4%) | climate change (117 –1.1 %) | insects (219 – 0.4%) | science policy (21,324 – 3.3%) | caddo (88 – 2%) |
| #9 | stn (19 – 0.4%) | pulsar (112 – 1%) | cons. genet. (217 – 0.5%) | uncategorized (21,079 – 3.2%) | Amer. South. (88 – 2%) |
| #10 | hydra. eng. (19 – 0.4%) | alien plant (108 – 1%) | fish (167 – 0.4%) | cell biol. (18,780 – 2.9%) | web craw. (38 – 0.9%) |
| Distinct | 740 | 1940 | 19,829 | 299 | 1977 |
Table 5
Top 10 formats associated with published datasets. Format is mime type – file extension.
| Format | 3TU.Datacentrum | CSIRO | Dryad | Figshare | Zenodo |
|---|---|---|---|---|---|
| #1 | app./x-netcdf (3070 – 80%) | app./fits – sf (143,376 – 26%) | text/plain – txt (4926 – 16%) | n/a – xls (267,222 – 83.3%) | n/a – sav (1798 – 15.2%) |
| #2 | app./zip (559 – 14.6%) | image/png – png (95,072 – 17.3%) | Excel 2007 – xlsx (3793 – 12.3%) | n/a – pdf (16,996 – 5.3%) | n/a – txt (1243 – 10.5%) |
| #3 | text/plain (57 – 1.5%) | app./fits – rf (94,028 – 17.1%) | text/csv – csv (3099 – 10%) | n/a (12,968 – 4%) | n/a – png (1059 – 9%) |
| #4 | app./octet-stream (27 – 0.7%) | app./fits – FTp (92,888 – 16.9%) | app./zip – zip (2834 – 9.2%) | n/a – docx (4868 – 1.5%) | n/a – fits (1043 – 8.8%) |
| #5 | app./x-hdf5 (22 – 0.6%) | app./fits – cf (82,204 – 14.9%) | Excel – xls (2074 – 6.7%) | n/a – doc (4511 – 1.4%) | n/a – zip (878 – 7.4%) |
| #6 | app./x-gzip (19 – 0.5%) | n/a – adf (9833 – 1.8%) | n/a – n/a (2007 – 6.5%) | n/a – xlsx (4012 – 1.2%) | n/a – gz (616 – 5.2%) |
| #7 | video/x-msvideo (10 – 0.3%) | n/a – dat (4920 – 0.9%) | text/plain – nex (1191 – 3.9%) | app./zip – zip (1988 – 0.6%) | n/a – csv (532 – 4.5%) |
| #8 | video/mpeg (9 – 0.2%) | n/a – nit (4911 – 0.9%) | app./pdf – pdf (1097 – 3.6%) | n/a – csv (1422 – 0.4%) | n/a – csv (269 – 2.3%) |
| #9 | app./x-gzip (8 – 0.2%) | image/tiff – tif (3926 – 0.7%) | app./x-gzip – gz (734 – 2.4%) | n/a – jpg (1395 – 0.4%) | n/a – itp (260 – 2.2%) |
| #10 | app./zip (4 – 0.1%) | n/a – 001 (1637 – 0.3%) | app./x-fasta – fasta (728 – 2.4%) | n/a – cif (1108 – 0.3%) | n/a – ods (205 – 1.7%) |
| Distinct | 53 | 1876 | 868 | 524 | 961 |
Table 6
Dataset attributes supported by Scientific Data Repositories.
| 3TU.Dat. | CSIRO | Dryad | Figshare | Zenodo | |
|---|---|---|---|---|---|
| Availability | ✓ | ✓ | ✓ | ✓ | ✓ |
| Bibliometric data | ✓ | ✓ | ✓ | ||
| Coverage | ✓ | ✓ | ✓ | ||
| Date | ✓ | ✓ | ✓ | ✓ | ✓ |
| Format | ✓ | ✓ | ✓ | ||
| License | ✓ | ✓ | ✓ | ✓ | |
| Minimal description | ✓ | ✓ | ✓ | ✓ | ✓ |
| Paper reference | ✓ | ✓ | ✓ | ✓ | |
| Project | ✓ | ✓ | ✓ | ||
| Provenance | ✓ | ||||
| Subjects | ✓ | ✓ | ✓ | ✓ | ✓ |
Table 7
Top 5 licences associated with published dataset. ‘(c) CiTG Delft’ is ‘Delft University of Technology, Civil Engineering and Geosciences’; ‘Delft, KWR’ is ‘Delft University of Technology, KWR Watercycle Research Institute’; ‘openAccess’ is ‘info:eurepo/semantics/openAccess’; ‘closedAccess’ is ‘info:eu-repo/semantics/closedAccess’.
| Licence | 3TU.Dat. | CSIRO | Dryad | Figshare | Zenodo |
|---|---|---|---|---|---|
| #1 | n/a (3453 – 97.85%) | CC-BY 3.0 (870 – 61.35%) | CC0 1.0 (29,025 – 94.19%) | n/a (308,108 – 96.16%) | CC0 1.0 (1041 – 64.7%) |
| #2 | CC BY-SA 3.0 (22 – 0.62%) | CSIRO Data Licence (328 – 23.13%) | n/a (1745 – 5.66%) | CC-BY (12,262 – 3.83%) | CC BY 4.0 (251 – 15.6%) |
| #3 | (c) CITG Delft (18 – 0.51%) | CC-BY 4.0 (83 – 5.85%) | Unknown (8 – 0.02%) | CC0 (41 – 0.01%) | openAccess (175 – 10.88%) |
| #4 | Delft, KWR (16 – 0.45%) | No Licence (47 – 3.31%) | Custom (1 – n/a) | Apache-2.0 (2 – n/a) | CC BY-SA 4.0 (72 – 4.47%) |
| #5 | Public (12 – 0.34%) | CC BY-NC-ND 3.0 (45 – 3.17%) | Custom (1 – n/a) | GPL-3.0 (1 – n/a) | closedAccess (50 – 3.11%) |
| Distinct | 10 | 10 | 39 | 6 | 7 |
Table 8
Dataset discovery facilities. The CSIRO OAI-PMH facility is actually offered via the Research Data Australia service.
| End-user Facilities | |||||
|---|---|---|---|---|---|
| 3TU.Dat. | CSIRO | Dryad | Figshare | Zenodo | |
| Keyword-based | ✓ | ✓ | ✓ | ✓ | ✓ |
| Field-based | ✓ | ✓ | ✓ | ✓ | ✓ |
| Browse | ✓ | ✓ | ✓ | ✓ | ✓ |
| Other | ✓ | ✓ | |||
| Web-based API and Protocols | |||||
| 3TU.Dat. | CSIRO | Dryad | Figshare | Zenodo | |
| Harvesting | OAI-PMH | OAI-PMH | OAI-PMH | In-house | OAI-PMH |
| Search | n/a | In-house | n/a | In-house | n/a |
