
The FORBIN Dataset: A Collection of Historical Photographs With Archival Metadata
Abstract
Early photography and the first massification of the medium at the turn of the 20th century is foundational to present day visual cultures. This is specifically true of the visual archives of the first generation of news picture agencies from the 1900s to the 1920s. The latest generation of artificial intelligence models provides the ability to analyse such photographs beyond their purely pictorial dimension, either by grouping them according to their semantic content or by generating textual descriptions. Cross-seeding approaches between historical epistemologies and computer vision expertise can unlock new perspectives on the computational analysis of large and poorly curated photographic archives to provide new insights on early photographic cultures. The automated detection of visual and textual contents, as well as the exploration of the transformation of continuous-tone photographs into halftone images, can help us understand the complexity and historical variability of systems and structures that created, stored and circulated news photographs. In this paper, we introduce the FORBIN dataset. A Paris-based journalist, Victor Forbin, created his own news picture agency. He bought original prints from around the world and then sold them to major French newspapers from the late 1900s to the 1930s. The FORBIN dataset is composed of 62,135 photographs where both the front and back sides are digitized. This paper includes information of the history of this collection, its digitization and its ensuing transformation into a dataset. The Forbin dataset can be used for various computer vision tasks, such as text recognition, image similarity analysis, caption generation, and metadata extraction.
© 2026 Mohamed Chelali, Sylvain-Karl Gosselet, Florence Cloppet, Camille Kurtz, Isabelle Bloch, Daniel Foliard, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.