Uncovering Biases Through Faceted Image Retrieval: Wikidata as a Data Provider for Art-Historical Research

Stefanie Schneider; Matthias Springstein; Julian Stalter; Daniel Ritter; Ralph Ewerth; Eric Müller-Budack

doi:10.5334/johd.436

Abstract

Vision-Language Models (VLMs) are increasingly employed in art-historical research and teaching infrastructures to enhance the discovery of images in large visual collections. Unlike traditional metadata-based systems—where retrieval depends on human-curated textual descriptors—VLM-based approaches enable content-aware searches that directly compare visual structures. However, VLMs also embed culturally situated biases, compressing visual phenomena into opaque statistical representations. For art history, this tension is particularly significant: while VLMs facilitate the detection of visual motifs beyond categorical restrictions, their interpretability and verifiability remain limited compared to metadata-based systems, which integrate centuries of scholarly and contextual expertise. In this paper, we argue that VLM-driven approaches should augment, not replace, metadata-based infrastructures. We present a hybrid retrieval pipeline that integrates VLM-derived embeddings with structured metadata from Wikidata, using faceting mechanisms to organize and navigate multimodal results. By also deriving triplet-based assertions of depicted entities and linking them to existing metadata, our approach enhances both relevance and transparency in art-historical search. Implemented within a retrieval environment, this system exposes cultural and epistemic biases in both datasets and models, contributing to a reflective framework for applying Artificial Intelligence (AI) in the humanities.

References

Abdin, M. I., Jacobs, S. A., Awan, A. A., Aneja, J., Awadallah, A., Awadalla, H., …, & Zhou, X. (2024). Phi-3 technical report: A highly capable language model locally on your phone. 10.48550/arxiv.2404.14219
Open DOI Search in Google Scholar Back to article
Berger, J. (1972). Ways of seeing. Penguin Books.
Search in Google Scholar Back to article
Clark, K. (1956). The nude: A study in ideal form. Pantheon Books. 10.1515/9780691252896
Open DOI Search in Google Scholar Back to article
DeepSeek-AI, Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., …, & Bi, X. (2025). DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. 10.48550/arxiv.2501.12948
Open DOI Search in Google Scholar Back to article
Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., …, & Stone, K. (2024). The Llama 3 herd of models. 10.48550/arxiv.2407.21783
Open DOI Search in Google Scholar Back to article
Dunn, A., Dagdelen, J., Walker, N., Lee, S., Rosen, A. S., Ceder, G., …, & Jain, A. (2022). Structured information extraction from complex scientific text with fine-tuned large language models. 10.48550/arxiv.2212.05238
Open DOI Search in Google Scholar Back to article
Impett, L., & Offert, F. (2022). There is a digital art history. Visual Resources, 38(2), 186–209. 10.1080/01973762.2024.2362466
Open DOI Search in Google Scholar Back to article
Kamath, A., Ferret, J., Pathak, S., Vieillard, N., Merhej, R., Perrin, S., …, & Hussenot, L. (2025). Gemma 3 technical report. 10.48550/arxiv.2503.19786
Open DOI Search in Google Scholar Back to article
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. 10.2307/2529310
Open DOI Search in Google Scholar Back to article
Mazzanti, P., Ferracani, A., Bertini, M., & Principi, F. (2025). Reshaping museum experiences with AI: The ReInHerit toolkit. Heritage, 8(7), 277. 10.3390/heritage8070277
Open DOI Search in Google Scholar Back to article
Offert, F., & Bell, P. (2023). imgs.ai: A deep visual search engine for digital art history. In A. Baillot, T. Tasovac, W. Scholger, & G. Vogeler (Eds.), International conference of the alliance of digital humanities organizations, DH2022. 10.5281/zenodo.8107778
Open DOI Search in Google Scholar Back to article
Pasquinelli, M., & Joler, V. (2021). The Nooscope manifested: AI as instrument of knowledge extractivism. AI & Society, 36(4), 1263–1280. 10.1007/S00146-020-01097-6
Open DOI Search in Google Scholar Back to article
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., …, & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. In M. Meila & T. Zhang (Eds.), Proceedings of the 38th international conference on machine learning, ICML 2021 (Vol. 139, pp. 8748–8763). PMLR. http://proceedings.mlr.press/v139/radford21a.html
Search in Google Scholar Back to article
Schneider, S., Springstein, M., Rahnama, J., Kohle, H., Ewerth, R., & Hüllermeier, E. (2022). iART: Eine Suchmaschine zur Unterstützung von bildorientierten Forschungsprozessen. In M. Geierhos (Ed.), 8. Tagung des Verbands Digital Humanities im deutschsprachigen Raum, DHd 2022 (pp. 142–147). 10.5281/zenodo.6328175
Open DOI Search in Google Scholar Back to article
Schwemmer, C., Knight, C., Bello-Pardo, E. D., Oklobdzija, S., Schoonvelde, M., & Lockhart, J. W. (2020). Diagnosing gender bias in image recognition systems. Socius, 6. 10.1177/2378023120967171
Open DOI Search in Google Scholar Back to article
Springstein, M., Schneider, S., Rahnama, J., Hüllermeier, E., Kohle, H., & Ewerth, R. (2021). iART: A search engine for art-historical images to support research in the humanities. In H. T. Shen et al. (Eds.), MM ’21: ACM multimedia conference (pp. 2801–2803). ACM. 10.1145/3474085.3478564
Open DOI Search in Google Scholar Back to article
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M., Lacroix, T., …, & Lample, G. (2023). LLaMA: Open and efficient foundation language models. 10.48550/arxiv.2302.13971
Open DOI Search in Google Scholar Back to article
Tschannen, M., Gritsenko, A. A., Wang, X., Naeem, M. F., Alabdulmohsin, I., Parthasarathy, N., …, & Zhai, X. (2025). SigLIP 2: Multilingual vision-language encoders with improved semantic understanding, localization, and dense features. 10.48550/arxiv.2502.14786
Open DOI Search in Google Scholar Back to article
van de Waal, H. (1973–1985). Iconclass: An iconographic classification system. Completed and edited by L. D. Couprie with R. H. Fuchs. Amsterdam: North-Holland Publishing Company.
Search in Google Scholar Back to article
van Straten, R. (1994). Iconography, indexing, Iconclass: A handbook. Foleor.
Search in Google Scholar Back to article
Wang, Y., & Luo, Z. (2023). Enhance multi-domain sentiment analysis of review texts through prompting strategies. 10.48550/arxiv.2309.02045
Open DOI Search in Google Scholar Back to article
Warburg, A. (1907). Arbeitende Bauern auf burgundischen Teppichen. Zeitschrift für bildende Kunst, XVIII, 41–47.
Search in Google Scholar Back to article
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., …, & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Advances in neural information processing systems 35: Annual conference on neural information processing systems 2022, NeurIPS 2022. http://papers.nips.cc/paper_files/paper/2022/hash/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html
Search in Google Scholar Back to article
Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., …, & Qiu, Z. (2025). Qwen3 technical report. 10.48550/arxiv.2505.09388
Open DOI Search in Google Scholar Back to article
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K. R., & Cao, Y. (2023). ReAct: Synergizing reasoning and acting in language models. In The eleventh international conference on learning representations, ICLR 2023. https://openreview.net/forum?id=WE_vluYUL-X
Search in Google Scholar Back to article
Yee, K., Swearingen, K., Li, K., & Hearst, M. A. (2003). Faceted metadata for image search and browsing. In G. Cockton & P. Korhonen (Eds.), Proceedings of the 2003 conference on human factors in computing systems, CHI 2003 (pp. 401–408). ACM. 10.1145/642611.642681
Open DOI Search in Google Scholar Back to article

Uncovering Biases Through Faceted Image Retrieval: Wikidata as a Data Provider for Art-Historical Research

Abstract

Paradigm

My account