Skip to main content
Have a personal or library account? Click to login
The FORBIN Dataset: A Collection of Historical Photographs With Archival Metadata Cover

The FORBIN Dataset: A Collection of Historical Photographs With Archival Metadata

Open Access
|Apr 2026

References

  1. Aissi, M. S., Giardinetti, M., Bloch, I., Schuh, J., & Foliard, D. (2025). Computer vision and halftone visual culture: improving similarity search for historical photographs. Multimedia Tools and Applications. 10.1007/s11042-025-20855-6
  2. Alkemade, H., Claeyssens, S., Colavizza, G., Freire, N., Lehmann, J., Neudecker, C., … van Strien, D. (2023). Datasheets for digital cultural heritage datasets. Journal of Open Humanities Data. DOI: 10.5334/johd.124
  3. Antonacopoulos, A., Gatos, B., & Bridson, D. (2007). Page segmentation competition. In International conference on document analysis and recognition (Vol. 2, p. 12791283). 10.1109/ICDAR.2007.4377121
  4. Arnold, T., & Tilton, L. (2023). Distant viewing: computational exploration of digital images (1st ed. ed.). The MIT Press. 10.1093/llc/fqz013
  5. Campello, R. J. G. B., Moulavi, D., & Sander, J. (2013). Density-based clustering based on hierarchical density estimates. In J. Pei, V. S. Tseng, L. Cao, H. Motoda, & G. Xu (Eds.), Advances in knowledge discovery and data mining (pp. 160172). 10.1007/978-3-642-37456-2_14
  6. Chumachenko, K., Männistö, A., Iosifidis, A., & Raitoharju, J. (2020). Machine learning based analysis of Finnish world war II photographers. IEEE Access, 8, 144184144196. 10.1109/ACCESS.2020.3014458
  7. Clausner, C., Antonacopoulos, A., & Pletschacher, S. (2017). ICDAR2017 competition on recognition of documents with complex layouts. In International conference on document analysis and recognition (Vol. 01, p. 14041410). 10.1109/ICDAR.2017.229
  8. Colavizza, G., Blanke, T., Jeurgens, C., & Noordegraaf, J. (2022). Archives and AI: an overview of current debates and future perspectives. ACM Journal on Computing and Cultural Heritage, 15(1), 4:14:15. 10.1145/3479010
  9. Denton, E., Hanna, A., Amironesei, R., Smart, A., & Nicole, H. (2021). On the genealogy of machine learning datasets: A critical history of ImageNet. Big Data and Society, 8(2). 10.1177/20539517211035955
  10. Du, L., Le, B., & Honig, E. (2024, January). Probing historical image contexts: Enhancing visual archive retrieval through computer vision. ACM Journal on Computing and Cultural Heritage, 16(4). 10.1145/3631129
  11. Efthymiadis, N., Psomas, B., Laskar, Z., Karantzalos, K., Avrithis, Y., Chum, O., & Tolias, G. (2025). Composed image retrieval for training-free domain conversion. In IEEE/CVF winter conference on applications of computer vision (pp. 17231733). 10.1109/WACV61041.2025.00175
  12. Ehrmann, M., Düring, M., Neudecker, C., & Doucet, A. (2022). Computational approaches to digitised historical newspapers. Dagstuhl Reports, 12(7), 112179. 10.4230/DAGREP.12.7.112
  13. Fischer, A., Indermühle, E., Bunke, H., Viehhauser, G., & Stolz, M. (2010). Ground truth creation for handwriting recognition in historical documents. In IAPR international workshop on document analysis systems (pp. 310). 10.1145/1815330.1815331
  14. Giardinetti, M., Foliard, D., Schuh, J., & Aissi, M.-S. (2024). The EyCon dataset: A visual corpus of early conflict photography. Journal of Open Humanities Data. 10.5334/johd.213
  15. Gruning, T., Labahn, R., Diem, M., Kleber, F., & Fiel, S. (2018). READ-BAD: A new dataset and evaluation scheme for baseline detection in archival documents. In International workshop on document analysis systems (pp. 351356). 10.1109/DAS.2018.38
  16. Gupta, A., Vedaldi, A., & Zisserman, A. (2016). Synthetic data for text localisation in natural images. In IEEE conference on computer vision and pattern recognition (pp. 23152324). 10.1109/CVPR.2016.254
  17. Gutehrlé, N., & Atanassova, I. (2021). Logical Layout Analysis Applied to Historical Newspapers. In Workshop on natural language processing for digital humanities (p. 8594).
  18. He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., & Sun, C. (2018). An end-to-end textspotter with explicit alignment and attention. In IEEE conference on computer vision and pattern recognition (pp. 50205029). 10.1109/CVPR.2018.00527
  19. Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Fei-Fei, L. (2017). Visual Genome: Connecting language and vision using crowd sourced dense image annotations. International Journal of Computer Vision, 123(1), 3273. 10.1007/s11263-016-0981-7
  20. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems 25: 26th annual conference on neural information processing systems (pp. 11061114). 10.1145/3065386
  21. Lee, B. C. G., Mears, J., Jakeway, E., Ferriter, M., Adams, C., Yarasavage, N., … Weld, D. S. (2020). The newspaper navigator dataset: Extracting headlines and visual content from 16 million historic newspaper pages in chronicling America. In International conference on information and knowledge management (pp. 30553062). 10.1145/3340531.3412767
  22. Lin, T., Maire, M., Belongie, S. J., Hays, J., Perona, P., Ramanan, D., … Zitnick, C. L. (2014). Microsoft COCO: common objects in context. In European conference on computer vision (Vol. 8693, pp. 740755). 10.1007/978-3-319-10602-1_48
  23. Lo, K., Shen, Z., Newman, B., Chang, J. C., Authur, R., Bransom, E., … Soldaini, L. (2023). PaperMage: A unified toolkit for processing, representing, and manipulating visually-rich scientific documents. In Conference on empirical methods in natural language processing (pp. 495507). 10.18653/v1/2023.emnlp-demo.45
  24. Papadopoulos, C., Pletschacher, S., Clausner, C., & Antonacopoulos, A. (2013). The IMPACT dataset of historical document images. In International workshop on historical document imaging and processing (pp. 123130). 10.1145/2501115.2501130
  25. Plummer, B. A., Wang, L., Cervantes, C. M., Caicedo, J. C., Hockenmaier, J., & Lazebnik, S. (2015). Flickr30k Entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In IEEE international conference on computer vision (pp. 26412649). 10.1109/ICCV.2015.303
  26. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., … Sutskever, I. (2021). Learning transferable visual models from natural language supervision. In International conference on machine learning (Vol. 139, pp. 87488763).
  27. Schuhmann, C., Beaumont, R., Vencu, R., Gordon, C., Wightman, R., Cherti, M., … Jitsev, J. (2022). LAION-5B: an open large-scale dataset for training next generation image-text models. In Advances in neural information processing systems. 10.52202/068431-1833
  28. Simistira, F., Bouillon, M., Seuret, M., Würsch, M., Alberti, M., Ingold, R., & Liwicki, M. (2017). ICDAR2017 competition on layout analysis for challenging medieval manuscripts. In International conference on document analysis and recognition (pp. 13611370). 10.1109/ICDAR.2017.223
  29. Smits, T., & Ros, R. (2023). Distant reading 940,000 online circulations of 26 iconic photographs. New Media & Society, 25(12), 35433572. 10.1177/14614448211049459
  30. Smits, T., Warner, B., Fyfe, P., & Lee, B. C. G. (2025). A fully-searchable multimodal dataset of the illustrated London news, 1842–1890. Journal of Open Humanities Data. 10.5334/johd.284
  31. Strauß, T., Leifert, G., Labahn, R., Hodel, T., & Mühlberger, G. (2018). ICFHR2018 competition on automated text recognition on a read dataset. In International conference on frontiers in handwriting recognition (p. 477482). 10.1109/ICFHR-2018.2018.00089
  32. Studer, M., Clérot, F., et al. (2023). FINLAM: Finnish language model and dataset for historical newspapers. In Newspapers & magazines ai models: Training and re-use in the digital humanities. Vienna, Austria.
  33. van Wissen, L., Vriend, N., Nijssen, A., Vereecken, L., & den Engelse, M. (2025). FAIR Photos – transforming a collection of two million historical press photos into five star data. Journal of Open Humanities Data. 10.5334/johd.271
  34. Veit, A., Matera, T., Neumann, L., Matas, J., & Belongie, S. J. (2016). COCO-Text: Dataset and benchmark for text detection and recognition in natural images. CoRR, abs/1601.07140.
  35. Wang, K., Babenko, B., & Belongie, S. J. (2011). End-to-end scene text recognition. In IEEE international conference on computer vision (pp. 14571464). 10.1109/ICCV.2011.6126402
  36. Wu, S., Schindler, K., Heitzler, M., & Hurni, L. (2023). Domain adaptation in segmenting historical maps: A weakly supervised approach through spatial co-occurrence. Journal of Photogrammetry and Remote Sensing, 197, 199211. 10.1016/j.isprsjprs.2023.01.021
  37. Zhang, J., Huang, Q., Liu, J., Guo, X., & Huang, D. (2025). Diffusion-4K: Ultra-high-resolution image synthesis with latent diffusion models. In IEEE computer vision and pattern recognition (pp. 2346423473). 10.1109/CVPR52734.2025.02185
DOI: https://doi.org/10.5334/johd.487 | Journal eISSN: 2059-481X
Language: English
Submitted on: Nov 21, 2025
Accepted on: Feb 17, 2026
Published on: Apr 6, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Mohamed Chelali, Sylvain-Karl Gosselet, Florence Cloppet, Camille Kurtz, Isabelle Bloch, Daniel Foliard, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.