Bridging the Data Discovery Gap: User-Centric Recommendations for Research Data Repositories

Mingfang Wu; Felicitas Löffler; Brigitte Mathiak; Fotis Psomopoulos; Uwe Schindler; Amir Aryani; Jordi Bodera Sempere; Antica Culina; Andreas Czerniak; Chris Erdmann; Kathleen Gregory; Nick Juty; Allyson Lister; Ying-Hsang Liu; Samantha Pearman-Kanza

doi:10.5334/dsj-2026-006

Figures & Tables

Table 1

Pros and cons of user study methods.

METHODS	DESCRIPTION	PROS	CONS	EXAMPLE TOOLS
Survey	Collect insights on user needs, functional requirements, user background, and satisfaction with a data discovery system (e.g., System Usability Score,² Khalsa et al., 2018)	Cost-effective, scalable, can reach a large number of users, can collect both quantitative and qualitative data.	Limited to self-reported perceptions, may lack depth, response bias can be an issue.	General: SurveyMonkey, Qualtrics, Google forms, Typeforms, Microsoft Forms
Interview	Structured or unstructured conversation with an individual user or a group of users (focus group) (e.g., Liu et al., 2023, Sostek et al., 2024).	Provide in-depth insights into user motivations for a data search and experience with a discovery service, allow for clarification and follow-up questions, useful for exploratory research.	Time-consuming, resource-intensive, findings may not be easily generalizable, requires skilled interviewers and interview data analysis.	Self-developed cheatsheet with interview steps and questions. AI tools can be used to transcript interview recordings and analyse interview transcripts (Wollin-Giering et al., 2024) that need to be verified by researchers.
Interaction log analysis	Analysis of search logs that captures user interactions with a system, allowing the analysis of search patterns, quality of relevance ranking, query and click behaviours, etc. (e.g., Kacprzak et al., 2018; Sharifpour et al., 2023)	Captures actual user behaviour beyond self-reporting, reveals data search patterns and system usage, can be tailored to specific research questions.	Lacks contextual explanations for behaviour, requires technical expertise for analysis, may not capture user motivations or frustrations directly.	Google analytics or Matomo (Quintel and Wilson, 2020) for general web traffic, most visited pages, user activities (e.g., page view, position of clicked search result). Self-coding for advance analysis of targeted investigation/research questions.
A/B testing	A/B testing compares two design alternatives to measure user preference and impact. This can be done with low-fidelity wireframes or fully functional systems (Vega-Gorgojo et al., 2016; Löffler et al., 2023).	Provides empirical evidence of design effectiveness, allows for direct comparison of alternatives, and can reach a large number of users.	Requires careful experiment design to ensure validity and avoid bias, can be time and cost-intensive, may not explain why one design performs better.	By randomly directing real users to alternative sites and analyse logs, or in a controlled setting where recruited users testing different designs and provide feedback³ (tools like Crazy Egg and Hotjar can record heatmaps and scroll maps).
Observational study	Observing users as they search for data, either in controlled environments or their natural workflow (e.g., Thomas et al., 2021)	Provides direct insights into challenges and pain points reveals unexpected behaviours and pain points.	Resource-intensive (time, personnel), potential for observer bias, user behaviour may be influenced by observation, findings may not be easily generalizable.	Can utilise screen and session recording tools, e.g., Lookback, Silverback, Hotjar, and Crazy Egg (for heatmaps).

Recommended user study methods at different stages of data discovery service development.

An example of a sub-level dataset which refers to a collection.

Screenshot of GFBio’s search interface with an expanded search including synonyms obtained from a terminology service.

An example of aggregating duplicated metadata records from several data catalogues.

Bridging the Data Discovery Gap: User-Centric Recommendations for Research Data Repositories

Figures & Tables

Figure 1

Table 1

Figure 2

Figure 3

Figure 4

Figure 5

Paradigm

My account