Have a personal or library account? Click to login
Data Discovery Paradigms: User Requirements and Recommendations for Data Repositories Cover

Data Discovery Paradigms: User Requirements and Recommendations for Data Repositories

Open Access
|Jan 2019

Figures & Tables

Table 1

Examples of use cases being re-written.

As AThemeI wantSo that
Ph.D CandidateEconomicsTo have advanced search functionalitySo he can refine a search when needed
ResearcherHerpetologyTo find more data to correlate with the locations of her tortoise populationsSo she can put her research into perspective and identify collaborators
dsj-18-878-g1.png
Figure 1

Two layered grouping of Use Cases. The first and second layer are from axial coding and open coding respectively.

Table 2

From Use Cases to Requirements.

As AThemeI wantSo that
ResearcherSocial ScienceTo see what data is available right nowMake a forecast
ResearcherSocial ScienceCares about Access Conditions
ResearcherPhysical Sciencewants a very prominent Download button
ResearcherComputer Sciencesee (data) publish date or available date
ResearcherHealth Sciencedata
RequirementIndication of Data availability
Table 3

Nine user requirements elicited from use cases.

User RequirementsUser TypeActors who can meet requirementDescription
(extracts from the ‘so that’ field)
REQ 1. Indication of data availabilityResearcher/Research StudentData Repository Operator, Data ProviderIf there is no clear indication of data availability, the search is usually dropped within the first 2 minutes. A ‘sort by availability’ function could also reveal potential data embargo. Ideally should have an evident big button for ‘Download’.
REQ 2. Connection of data with person/institution/paper/citations/grantsFunder/Researcher/Research StudentData Repository Operator, Data ProviderThis allows for ranking of datasets, the connection of the information displayed with personal details as well as accountability. Also, this information can be used for grant application as well as for comparative studies (datasets underpinned several papers). Finally, allow for the upload of manuscript for direct connection.
REQ 3. Fully annotated data (including granularity, origin, licensing, provenance, and method of production, times downloaded)Researcher/Research StudentData Provider, Data Repository OperatorThis information will validate the use of a dataset in a particular study, as well as remove the step of having to read the corresponding manuscript to understand the data. To judge validity, need to know where and when the data was measured, and the basic experimental and instrumental parameters. These are more important than e.g. who created the data. To assess the validity of the data, look at repository/paper, then look at the data first to see if it makes sense.
REQ 4. Filtering of data based on specific criteria on multiple fields at the same time (such a release date, geo coverage, text content, date range, specific events).Researcher/Research StudentData Repository Operator, Data ProviderSupport targeted studies (e.g. find global temperature records for volcanic eruptions in the last century; find articles on bronze age in Britain).
REQ 5. Cross-referencing of data (same or different repositories).Researcher/Research StudentData Provider, Data Repository OperatorHaving the same with different identifiers is not sufficiently convenient for studies. Also, there are multiple instances/versions and reproducibility necessitates specific uses every time. Finally, cross-referencing will avoid duplication and maximize efficiency and access.
REQ 6. Visual analytics/inspection of data/thumbnail previewResearcherData Repository OperatorDecide if this data set is right for a research purpose. Also allows quick visual filtering from a results set.
REQ 7. Sharing data (either whole dataset, particular records, or bibliographic information) in a collaborative environmentResearcher/Research StudentData Repository OperatorMake sure that there is a common space of keeping both data and their versions across time – alleviate the need to rerun at the last minute to check nothing has been published since last study/search, or to share bibliographic information about data.
REQ 8. Accompanying educational/training materialLibrarianResearch Office/Libraries, Data RepositoryHelp researchers manage and discover data in a methodical and seamless manner.
REQ 9. Portal functionality similar to other established academic portalsResearcherData Repository OperatorFor example, finding more within a subject, search by visual (i.e. draw a structure to search for), free text search, build query functionality, subscription, save lists.
dsj-18-878-g2.jpg
Figure 2

Query interface from National Snow & Ice Data Center (http://nsidc.org/data/search/), with spatial and temporal search up front.

dsj-18-878-g3.jpg
Figure 3

An example of preview data from Elsevier Datasearch (https://datasearch.elsevier.com/#/).

Table 4

Matching requirements to recommendations.

RequirementRecommendations
REQ1: data availabilityREC3 Assessable search result
REQ2: Connection of dataREC2 Multiple access points
REC4 Readable metadata records
REC8 Identifiable duplicates
REQ3: AnnotationsREC3 Assessable search result
REC4 Readable and analysable metadata records
REC6 Available data usage statistics
REQ4: Filtering with single or multiple criteriaREC1 Multiple query interfaces
REC2 Multiple access points
REQ5: Cross-referenceREC8 Identifiable duplicates
REQ6: Inspection of dataREC1 Multiple query interfaces
REC2 Multiple access points
REC3 Assessable search result
REQ7: Collaborative environmentREC5 Available bibliographic references
REQ8: Training materialEleven quick tips for finding research data
REQ9: Similarity across portalsREC1 Multiple query interfaces
REC2 Multiple access points
REC7 Consistent interface
Support data searchers from web search enginesREC9 Findable from web search engines
The Fair Data Principles – interoperabilityREC10 Interoperability with other repositories
Language: English
Submitted on: Aug 22, 2018
Accepted on: Dec 13, 2018
Published on: Jan 8, 2019
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2019 Mingfang Wu, Fotis Psomopoulos, Siri Jodha Khalsa, Anita de Waard, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.