Skip to main content
Have a personal or library account? Click to login
The FORBIN Dataset: A Collection of Historical Photographs With Archival Metadata Cover

The FORBIN Dataset: A Collection of Historical Photographs With Archival Metadata

Open Access
|Apr 2026

Abstract

Early photography and the first massification of the medium at the turn of the 20th century is foundational to present day visual cultures. This is specifically true of the visual archives of the first generation of news picture agencies from the 1900s to the 1920s. The latest generation of artificial intelligence models provides the ability to analyse such photographs beyond their purely pictorial dimension, either by grouping them according to their semantic content or by generating textual descriptions. Cross-seeding approaches between historical epistemologies and computer vision expertise can unlock new perspectives on the computational analysis of large and poorly curated photographic archives to provide new insights on early photographic cultures. The automated detection of visual and textual contents, as well as the exploration of the transformation of continuous-tone photographs into halftone images, can help us understand the complexity and historical variability of systems and structures that created, stored and circulated news photographs. In this paper, we introduce the FORBIN dataset. A Paris-based journalist, Victor Forbin, created his own news picture agency. He bought original prints from around the world and then sold them to major French newspapers from the late 1900s to the 1930s. The FORBIN dataset is composed of 62,135 photographs where both the front and back sides are digitized. This paper includes information of the history of this collection, its digitization and its ensuing transformation into a dataset. The Forbin dataset can be used for various computer vision tasks, such as text recognition, image similarity analysis, caption generation, and metadata extraction.

DOI: https://doi.org/10.5334/johd.487 | Journal eISSN: 2059-481X
Language: English
Submitted on: Nov 21, 2025
Accepted on: Feb 17, 2026
Published on: Apr 6, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Mohamed Chelali, Sylvain-Karl Gosselet, Florence Cloppet, Camille Kurtz, Isabelle Bloch, Daniel Foliard, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.