References
- Alencar, V., Kohwalter, T., Braganhole, V., da Silva, J. and Murta, L. (2024) ‘Prov-Dominoes: An approach for knowledge discovery from provenance data’, Expert Systems with Applications, 245. Available at: 10.1016/j.eswa.2023.123030
- Allahim, A., Shamsuddin, S.M. and Meulien, J. (2025) ‘Semantic approaches for query expansion: Taxonomy, challenges, and future research directions’, PeerJ Computer Science. Available at: 10.7717/peerj-cs.2664
- Amugongo, L.M., Mascheroni P., Brooks S., Doering S. and Seidel J. (2025) Retrieval augmented generation for large language models in healthcare: A systematic review. PLOS Digital Health, 4(6),
e0000877 . Available at: 10.1371/journal.pdig.0000877 - Bergold, J. and Thomas, S. (2012) ‘Participatory research methods: A methodological approach in motion’, Historical Social Research/Historische Sozialforschung, 37(4), pp. 191–222. Available at:
https://www.jstor.org/stable/41756482 - Bugbee, K., le Roux, J., Sisco, A., Kaulfus, A., Staton, P., Woods, C., Dixon, V., Lynnes, C. and Ramachandran, R. (2021) ‘Improving discovery and use of NASA’s Earth observation data through metadata quality assessments’, Data Science Journal, 20(1), p.
17 . Available at: 10.5334/dsj-2021-017 - Burton, A., Aryani, A., Koers, H., Manghi, P., Bruzzo, S.L., Stocker, M., Diepenbroek, M., Schindler, U. and Fenner, M. (2017) ‘The Scholix framework for interoperability in data-literature information exchange’, D-Lib Magazine, 23(1/2), pp. 1–20. Available at: 10.1045/january2017-burton
- Candela, L., Mangione, D. and Pavone, G. (2024) ‘The FAIR assessment conundrum: Reflections on tools and metrics’, Data Science Journal, 23(1), p.
33 . Available at: 10.5334/dsj-2024-033 - Chae, Y. and Davidson, T. (2025) ‘Large language models for text classification: From zero-shot learning to instruction-tuning’, Sociological Methods & Research. Available at: 10.1177/00491241251325243
- Cooper, D.M. and Springer, R. (2019) ‘Data communities: A new model for supporting STEM data sharing’, Ithaka S+R, 13 May. Available at: 10.18665/sr.311396
- CoreTrustSeal Standards and Certification Board (2022) CoreTrustSeal requirements 2023–2025 (V01.00). Available at: 10.5281/zenodo.7051012
- DataCite (2024a) DataCite thriving communities: 3000 repositories and counting. Available at: 10.5438/63qf-5740 (Accessed: 17 April 2024).
- DataCite Metadata Working Group (2024b) DataCite metadata schema for the publication and citation of research data and other research outputs. Version 4.5. DataCite e.V. Available at: 10.14454/znvd-6q68
- Davenport, E. (2010) ‘Confessional methods and everyday life information seeking’, Annual Review of Information Science and Technology, 44(1), pp. 533–562. Available at: 10.1002/aris.2010.1440440119
- Dixit, R., Rogith, D., Narayana, V., Salimi, M., Gururaj, A., Ohno-Machado, L., Xu, H. and Johnson, T.R. (2018) ‘User needs analysis and usability assessment of DataMed – a biomedical data discovery index’, Journal of the American Medical Informatics Association, 25(3), pp. 337–344. Available at: 10.1093/jamia/ocx134
- Felden, J., Möller, L., Schindler, U., Huber, R., Schumacher, S., Koppe, R., Diepenbroek, M. and Glöckner, F.O. (2023) ‘PANGAEA – Data publisher for earth & environmental Science’, Scientific Data, 10, p.
347 . Available at: 10.1038/s41597-023-02269-x - Flanagan, J.C. (1954) ‘The critical incident technique’, Psychological Bulletin, 51(4), pp. 327–359. Available at: 10.1037/h0061470
- Foulonneau, M., Cole, T.W., Habing, T.G. and Shreeves, S.L. (2005) ‘Using collection descriptions to enhance an aggregation of harvested item-level metadata’, in Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries. Denver:
Association for Computer Machinery , pp. 32–42. Available at: 10.1145/1065385.1065393 - Friedrich, T. (2020). Looking for data: Information seeking behaviour of survey data users (Doctoral dissertation). Humboldt-Universität zu Berlin. Available at: 10.18452/22173
- Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M. and Wang, H. (2023) ‘Retrieval-augmented generation for large language models: A surve’, arXiv:2312.10997v5 [cs.CL]. Available at: 10.48550/arXiv.2312.10997
- Gregory, A., Bell, D., Brickley, D., Buttigieg, P.L., Cox, S., Edwards, M., Doug, F., Gonzalez Morales, L.G., Heus, P., Hodson, S., Kanjala, C., Le Franc, Y., Maxwell, L., Molloy, L., Richard, S., Rizzolo, F., Winstanley, P. and Wyborn, L. (2024) WorldFAIR (D2.3) (version 1). Available at: 10.5281/zenodo.11236871
- Gregory, K., Groth, P., Scharnhorst, A. and Wyatt, S. (2020) ‘Lost or found? Discovering data needed for research’, Harvard Data Science Review, 2(2). Available at: 10.1162/99608f92.e38165eb
- Hodson, S. (2024) WorldFAIR (D2.2) WorldFAIR’s experience with FIPs (second set of FAIR implementation profiles for each case study) (version 1). Available at: 10.5281/zenodo.11236094
- Jeng, W., He, D. and Chi, Y. (2017) ‘Social science data repositories in data deluge: A case study of ICPSR’s workflow and practices’, The Electronic Library, 35(4), pp. 626–649. Available at: 10.1108/EL-11-2016-0243
- Kacprzak, E., Koesten, L., Tennison, J. and Simperl, E. (2018) ‘Characterising dataset search queries’, in WWW ’18: Companion Proceedings of the Web Conference 2018. Geneva:
International World Wide Web Conferences Steering Committee , pp. 1485–1488. Available at: 10.1145/3184558.3191597 - Kalinin, N.A. and Skvortsov, N.A. (2023) ‘Difficulties of FAIR principles implementation in cross-domain research infrastructures’, Lobachevskii Journal of Math, 44, pp. 147–156. Available at: 10.1134/S199508022301016X
- Koesten, L., Gregory, K., Groth, P. and Simperl, E. (2021) ‘Talking datasets—Understanding data sense-making behaviours’, International Journal of Human-Computer Studies, 146,
102562 . Available at: 10.1016/j.ijhcs.2020.102562 - Koesten, L.M., Kacprzak, E., Tennison, J.F. and Simperl, E. (2017). ‘The Trials and Tribulations of Working with Structured Data: a Study on Information Seeking Behaviour’, in Proceedings of the 2017 CHI conference on human factors in computing systems (pp. 1277–1289). Available at: 10.1145/3025453.3025838
- Khalsa, S., Cotroneo, P. and Wu, M. (2018) ‘A survey of current practices in data search services’, Mendeley Data, V1. Available at: 10.17632/7j43z6n22z.1
- Klump, J., Wyborn, L., Wu, M., Martin, J., Downs, R.R. and Asmi, A. (2021) ‘Versioning data is about more than revisions: A conceptual framework and proposed principles’, Data Science Journal, 20(1), p.
12 . Available at: 10.5334/dsj-2021-012 - Krans, N.A., Ammar, A., Nymark, P., Willighagen, E.L., Bakker, M.I. and Quik, J.T.K. (2022). ‘FAIR assessment tools: Evaluating use and performance’, NanoImpact, 27, p.
100402 . Available at: 10.1016/j.impact.2022.100402 - Lacagnina, C., David, R., Nikiforova, A., Kuusniemi, M.E., Cappiello, C., Biehlmaier, O., Wright, L., Schubert, C., Bertino, A., Thiemann, H., & Dennis, R. (2023). Towards a data quality framework for EOSC (1.0.0). Available at: 10.5281/zenodo.7515816
- Lafia, S., Million, A.J. and Hemphill, L. (2023) ‘Direct, orienting, and scenic paths: How users navigate search in a research data archive’, in Proceedings of the 2023 Conference on Human Information Interaction and Retrieval (CHIIR ‘23). New York:
Association for Computing Machinery , pp. 128–136. Available at: 10.1145/3576840.3578275 - Lang, J.M. and Benbow, M.E. (2013) ‘Species interactions and competition’, Nature Education Knowledge, 4(4), p.
8 . Available at:https://www.nature.com/scitable/knowledge/library/species-interactions-and-competition-102131429/ - Lister, A. and Sansone, A. (2023, July 28) FAIRsharing in a nutshell. Available at: 10.5281/zenodo.8191958
- Liu, Y.-H., Wu, M., Power, M. and Burton, A. (2022) Elicitation of data discovery contexts: An interview study (1.0). Available at: 10.5281/zenodo.7179526
- Liu, Y.-H., Wu, M., Power, M. and Burton, A. (2023) Elicitation of contexts for discovering clinical trials and related health data: An interview study (V1.0). Available at: 10.5281/zenodo.7839282
- Löffler, F., Wesp, V., König-Ries, B. and Klan, F. (2021) ‘Dataset search in biodiversity research: Do metadata in data repositories reflect scholarly information needs?’ PLoS ONE, 16(3),
e0246099 . Available at: 10.1371/journal.pone.0246099 - Löffler, F., Shafiei, F., Witte, R., König-Ries, B. and Klan, F. (2023) ‘Semantic search for biological datasets: A usability study on modes of querying and explaining search results’, 20th Conference on Database Systems for Business, Technology and Web, BTW 2023. Dresden, Germany,
6–10 March . Available at: 10.18420/BTW2023-56 - Manghi, P., Bardi, A., Atzori, C., Baglioni, M., Manola, N., Schirrwagen, J. and Principe, P. (2019) The OpenAIRE research graph data model. Available at: 10.5281/zenodo.2643199
- Marchionini, G. (2006) ‘Exploratory search: from finding to understanding’, Communications of the ACM, 49(4), pp. 41–42. Available at: 10.1145/1121949.1121979
- Million, A.J., York, J., Lafia, S. and Hemphill, L. (2025) ‘Data, not documents: Moving beyond theories of information-seeking behavior to advance data discovery’, Journal of the Association for Information Science and Technology, 76(4), pp. 649–664. Available at: 10.1002/asi.24962
- Miller, M. and Vielfaure, N. (2022) ‘OpenRefine: An approachable open tool to clean research data’, Bulletin – Association of Canadian Map Libraries and Archives (ACMLA), 170. Available at: 10.15353/acmla.n170.4873
- Nasir, J.A., Varlamis, I. and Ishfaq, S. (2019) ‘A knowledge-based semantic framework for query expansion’, Information Processing & Management, 56(5), pp. 1605–1617. Available at: 10.1016/j.ipm.2019.04.007
- National Library of Medicine (2021) SNOMED CT to ICD-10-CM map. Available at:
https://www.nlm.nih.gov/research/umls/mapping_projects/snomedct_to_icd10cm.html . - NISO (National Information Standards Organization) (2004) Understanding metadata. Bethesda: NISO Press. Available at:
https://www.niso.org/standards/resources/UnderstandingMetadata.pdf . - Peng, G., Berg-Cross, G., Wu, M., Downs, R.R., Shrestha, S.R., Wyborn, L., Ritchey, N., Ramapriyan, H.K., Clark, S.J., Wood, J., Liu, Z. and Marouane, A. (2024) ‘Harmonizing quality measures of FAIRness assessment towards machine-actionable quality information’, International Journal of Digital Earth, 17(1). Available at: 10.1080/17538947.2024.2390431
- Pressman, R. S. and Maxim, B. R. (2015). Software Engineering: A Practitioner’s Approach (8th ed.). McGraw-Hill Education.
- Quintel, D. and Wilson, R. (2020) ‘Analytics and privacy: Using Matomo in EBSCO’s discovery service’, Information Technology and Libraries, 39(3). Available at: 10.6017/ital.v39i3.12219
- Sharifpour, R., Wu, M. and Zhang, X. (2023) ‘Large-scale analysis of query logs to profile users for dataset search’, Journal of Documentation, 79(1), pp. 66–85. Available at: 10.1108/JD-12-2021-0245
- Shneiderman, B., Plaisant, C., Cohen, M., Jacobs, S., Elmqvist, N. and Diakopoulos, N. (2016) Designing the user interface: Strategies for effective human-computer interaction. 6th ed. Boston: Pearson.
- Silva, L. and Barbosa, L. (2024) ‘Improving dense retrieval models with LLM augmented data for dataset search’, Knowledge-Based Systems, 294,
111740 . Available at: 10.1016/j.knosys.2024.111740 - Slavković, A. and Seeman, J. (2023) ‘Statistical data privacy: A song of privacy and Uutility’, Annual Review of Statistics and Its Application, 10, pp. 189–218. Available at: 10.1146/annurev-statistics-033121-112921
- Smith, L.C. (2020) ‘Interdisciplinary searching as a use case for vocabulary mapping’, in M. Lykke, T. Svarre, M. Skov and D. Martínez-Ávila (eds.) Knowledge organization at the interface: Proceedings of the sixteenth international ISKO conference, 2020 Aalborg, Denmark. vol. 17. Baden-Baden:
Ergon-Verlag , pp. 428–435. Available at: 10.5771/9783956507762-428 - Sostek, K., Russell, D.M., Goyal, N., Alrashed, T., Dugall, S. and Noy, N. (2024) ‘Discovering datasets on the web scale: Challenges and recommendations for Google dataset search’, Harvard Data Science Review, special issue 4. Available at: 10.1162/99608f92.4c3e11ca
- Stall, S., Bilder, G., Cannon, M., Hong N.C., Edmunds, S., Erdmann, C.C., Evans, M., Farmer, R., Feeney, P., Friedman, M., Giampoala, M., Hanson, R.B., Harrison, M., Karaiskos, D., Katz, D.S., Letizia, V., Lizzi, V., MacCallum, C., Meunch, A., Perry, K., Ratner, H., Schindler, U., Sedora, B., Stockhause, M., Townsend, R., Yeston, J. and Clark, T. (2023) ‘Journal production guidance for software and data citations’, Scientific Data, 10,
656 . Available at: 10.1038/s41597-023-02491-7 - Sun, D., Hnatiuk, R.J. and Neldner, V.J. (1997) ‘Review of vegetation classification and mapping systems undertaken by major forested land management agencies in Australia’, Australian Journal of Botany, 45(6), pp. 929–948. Available at: 10.1071/BT96121
- Taniguchi, S. and Hashizume, A. (2023) ‘Transforming metadata content guidelines and instructions to linked data’, Journal of Documentation, 51(4). Available at: 10.1177/01655515221142428
- Thomas, K., Papenmeier, A., Carevic, Z., Kern, D. and Mathiak, B. (2021) ‘Data-seeking behaviour in the social sciences’, International Journal on Digital Libraries, 22(2), pp. 175–195. Available at: 10.1007/s00799-021-00303-0
- Terolli, E., Ernst, P. and Weikum, G. (2020)
‘Focused query expansion with entity cores for patient-centric health search’ , in J.Z. Pan, V. Tamma, C. d’Amato, K. Janowicz, B. Fu, A. Polleres, O. Seneviratne and L. Kagal (eds.) The semantic web – ISWC 2020. Cham: Springer, pp. 547–564. Available at: 10.1007/978-3-030-62419-4_31 - Vega-Gorgojo, G., Slaughter, L., Giese, M., Heggestoyl, S., Soylu, A. and Waaler, A. (2016) ‘Visual query interfaces for semantic datasets: An evaluation study’, Journal of Web Semantics, 39(C), pp. 81–96. Available at: 10.2139/ssrn.3199241
- Wang, R. Y. and Strong, D. M. (1996). ‘Beyond Accuracy: What Data Quality Means to Data Consumers’, Journal of Management Information Systems, 12(4), pp. 5–33. Available at: 10.1080/07421222.1996.11518099
- Wang, X., Wang, Z., Gao, X., Zhang, F., Wu, Y., Xu, Y., Xu Z., Shi, T., Wang, Z., Li, S., Qian, Q., Yin, R., Lv, C., Zheng, X. and Huang, X. (2024) ‘Searching for best practices in retrieval-augmented generation’, in Y. Al-Onaizan, M. Bansal and Y.-N. Chen (eds.) Proceedings of the 2024 conference on empirical methods in natural language processing. Miami:
Association for Computational Linguistics , pp. 17716–17736. Available at:: 10.18653/v1/2024.emnlp-main.981 - Weitz, J. (2020) ‘Improving WorldCat quality: Resolving to reduce duplicates’, Organizacija znanja, 25 (1–2),
2025003 . Available at: 10.3359/oz2025003 - Wentzel, B., Kirstein, F., Jastrow, T., Sturm, R., Peters, M. and Schimmler, S. (2023)
‘An extensive methodology and framework for quality assessment of DCAT-AP datasets’ , in I. Lindgren, C. Csáki, E. Kalampokis, M. Janssen, G.V. Pereira, S. Virkar, E. Tambouris and A. Zuiderwijk (eds.) Electronic Government. Cham: Springer, pp. 262–278. Available at: 10.1007/978-3-031-41138-0_17 - White, R.W. (2016) Interactions with search systems. Cambridge: Cambridge University Press.
- Wilkinson, M., Dumontier, M., Aalbersberg, I., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L.B., Bourne, P.E., Bouwman, J., Brookes, A.J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C.T., Finkers, R., Gonzalez-Beltran, A., Gray, A.J.G., Groth, P., Goble, C., Grethe, J.S., Heringa, J., Hoen, P.A.C., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S.J., Martone, M.E., Mons, A., Packer, A.L., Persson, B., Rocca-Serra, P., Roos, M., van Schaik, R., Sansone, S.-A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M.A., Thompson, M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J. and Mons, B. (2016) ‘The FAIR Guiding Principles for scientific data management and stewardship’, Scientific Data, 3,
160018 . Available at: 10.1038/sdata.2016.18 - Wollin-Giering, S., Hoffmann, M., Höfting, J. and Ventzke, C. (2024) ‘Automatic transcription of English and German qualitative interviews’, Forum Qualitative Sozialforschung Forum: Qualitative Social Research, 25(1). Available at: 10.17169/fqs-25.1.4129
- Wu, M. (2022) ARDC Project: Eliciting data search context. Available at: 10.5281/zenodo.6819787
- Wu, M., Juty, N., RDA Research Metadata Schemas WG, Collins, J., Duerr, R., Ridsdale, C., Shepherd, A., Verhey, C. and Castro, L.J. (2021) Guidelines for publishing structured metadata on the web (3.1). Available at: 10.15497/RDA00066
- Wu, M., Psomopoulos, F., Khalsa, S.J. and de Waard, A. (2019) ‘Data discovery paradigms: User requirements and recommendations for data repositories’, Data Science Journal, 18(1), p.
3 . Available at: 10.5334/dsj-2019-003 - Wu, M., Brandhorst, H., Marinescu, M., Lopez, J.M., Hlava, M. and Busch, J. (2023) ‘Automated metadata annotation: What is and is not possible with machine learning’, Data Intelligence, 5(1), pp. 122–138. Available at: 10.1162/dint_a_00162
- Wu, M., Gregory, K., Löffler, F., Mathiak, B., Psomopoulos, F., Schindler, U., Aryani, A., Bodera, J., Castro, L.J., Culina, A., Czerniak, A., Erdmann, C., Grethe, J., Hellström, M., Henzen, C., Hunter, C., Juty, N., Kvale, L., Lister, A., Liu, Y.-H., Madon, B., Medina-Smith, A., Parton, G., Pearman-Kanza, S., Pörsch, A., Söding, E., Szabo, D., van der Meer, L., Weisweiler, N., Widmann, H. and Woodford, C.J. (2024) Ten principles to improve dataset discoverability (1.0). Available at: 10.15497/rda/00120
