Have a personal or library account? Click to login
Bridging the Data Discovery Gap: User-Centric Recommendations for Research Data Repositories Cover

Bridging the Data Discovery Gap: User-Centric Recommendations for Research Data Repositories

Open Access
|Feb 2026

Figures & Tables

dsj-25-2045-g1.png
Figure 1

Iterative data discovery journey.

Table 1

Pros and cons of user study methods.

METHODSDESCRIPTIONPROSCONSEXAMPLE TOOLS
SurveyCollect insights on user needs, functional requirements, user background, and satisfaction with a data discovery system (e.g., System Usability Score,2 Khalsa et al., 2018)Cost-effective, scalable, can reach a large number of users, can collect both quantitative and qualitative data.Limited to self-reported perceptions, may lack depth, response bias can be an issue.General: SurveyMonkey, Qualtrics, Google forms, Typeforms, Microsoft Forms
InterviewStructured or unstructured conversation with an individual user or a group of users (focus group) (e.g., Liu et al., 2023, Sostek et al., 2024).Provide in-depth insights into user motivations for a data search and experience with a discovery service, allow for clarification and follow-up questions, useful for exploratory research.Time-consuming, resource-intensive, findings may not be easily generalizable, requires skilled interviewers and interview data analysis.Self-developed cheatsheet with interview steps and questions. AI tools can be used to transcript interview recordings and analyse interview transcripts (Wollin-Giering et al., 2024) that need to be verified by researchers.
Interaction log analysisAnalysis of search logs that captures user interactions with a system, allowing the analysis of search patterns, quality of relevance ranking, query and click behaviours, etc. (e.g., Kacprzak et al., 2018; Sharifpour et al., 2023)Captures actual user behaviour beyond self-reporting, reveals data search patterns and system usage, can be tailored to specific research questions.Lacks contextual explanations for behaviour, requires technical expertise for analysis, may not capture user motivations or frustrations directly.Google analytics or Matomo (Quintel and Wilson, 2020) for general web traffic, most visited pages, user activities (e.g., page view, position of clicked search result).
Self-coding for advance analysis of targeted investigation/research questions.
A/B testingA/B testing compares two design alternatives to measure user preference and impact. This can be done with low-fidelity wireframes or fully functional systems (Vega-Gorgojo et al., 2016; Löffler et al., 2023).Provides empirical evidence of design effectiveness, allows for direct comparison of alternatives, and can reach a large number of users.Requires careful experiment design to ensure validity and avoid bias, can be time and cost-intensive, may not explain why one design performs better.By randomly directing real users to alternative sites and analyse logs, or in a controlled setting where recruited users testing different designs and provide feedback3 (tools like Crazy Egg and Hotjar can record heatmaps and scroll maps).
Observational studyObserving users as they search for data, either in controlled environments or their natural workflow (e.g., Thomas et al., 2021)Provides direct insights into challenges and pain points reveals unexpected behaviours and pain points.Resource-intensive (time, personnel), potential for observer bias, user behaviour may be influenced by observation, findings may not be easily generalizable.Can utilise screen and session recording tools, e.g., Lookback, Silverback, Hotjar, and Crazy Egg (for heatmaps).
dsj-25-2045-g2.png
Figure 2

Recommended user study methods at different stages of data discovery service development.

dsj-25-2045-g3.png
Figure 3

An example of a sub-level dataset which refers to a collection.

dsj-25-2045-g4.png
Figure 4

Screenshot of GFBio’s search interface with an expanded search including synonyms obtained from a terminology service.

dsj-25-2045-g5.png
Figure 5

An example of aggregating duplicated metadata records from several data catalogues.

Language: English
Submitted on: Jul 19, 2025
|
Accepted on: Jan 6, 2026
|
Published on: Feb 12, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Mingfang Wu, Felicitas Löffler, Brigitte Mathiak, Fotis Psomopoulos, Uwe Schindler, Amir Aryani, Jordi Bodera Sempere, Antica Culina, Andreas Czerniak, Chris Erdmann, Kathleen Gregory, Nick Juty, Allyson Lister, Ying-Hsang Liu, Samantha Pearman-Kanza, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.