
Figure 1
Illustration of the visual heterogeneity across heritage datasets. The EyCon photograph (Giardinetti et al., 2024) shows overlapping prints, typical of archival contact sheets, while the Forbin image presents a caption stitched to the lower margin in the recto and stamps in the verso. Variations in contrast and tone result from differences in material supports and digitization workflows.

Figure 2
Illustration that compares the scale of the proposed Forbin dataset with a variety of existing benchmarks. Each dataset is represented in blue for monomodal data and in red for multimodal data, positioned according to its targeted tasks on the vertical axis and its total number of samples on the horizontal axis.
Table 1
Comparison of cultural and historical datasets according to covered tasks. While most existing collections focus on a single objective, such as handwritten text recognition, document layout analysis, or photo archiving, the Forbin dataset stands out by jointly addressing multiple complementary dimensions. It combines annotated historical photographs with metadata, textual content (both printed and handwritten), and scene text annotations, offering a unified resource for multimodal analysis of historical visual materials.
| DATASET | SIZE | HANDWRITTEN TEXT RECOG. | METADATA | HISTO. PHOTOS | OCR & SCENE TEXT | LAYOUT & DOC. ANALYS. |
|---|---|---|---|---|---|---|
| READ | 400K | ✓ | — | — | — | — |
| IMPACT | 20K | — | — | — | ✓ | — |
| ICFHR2018 | 3K | ✓ | — | — | — | — |
| FINLAM | 161K | — | — | — | — | ✓ |
| Newspaper Navigator | 16M | — | — | — | — | ✓ |
| Bain Collection | 40K | — | — | ✓ | — | — |
| Finnish WWII | 160K | — | ✓ | ✓ | — | — |
| EyCon | 130K | — | ✓ | ✓ | — | — |
| Forbin dataset | 120K | ✓ | ✓ | ✓ | ✓ | — |

Figure 3
Overview of the construction workflow of the Forbin dataset and selected samples. (a) Photographs were extracted from archival boxes, scanned, and digitized. Metadata related to provenance and digitization conditions were collected during this process. (b) Examples of digitized samples illustrating the visual diversity and quality of the data. The recto side may feature original prints or images that were retouched photographs, ready for layout design and publication, while the verso side often contains handwritten notes, captions in multiple languages, stamps, and editorial marks.

Figure 4
Three pairs of images from the proposed Forbin dataset, showing both the recto and verso sides (first and second rows), together with their associated metadata (bottom row). The metadata include the image identifier, archival box name, and country of origin. The continent field results from a manual classification into continental or thematic groups when the geographical origin is unknown, while the cluster and cluster name are derived from the HDBSCAN clustering applied to the class attributes across the entire dataset.

Figure 5
Distribution of the 30 semantic classes across continents. The pie charts reveal a marked thematic imbalance, with “Military life” dominating in most regions except Africa. This visualization underscores regional variations in subject matter within the Forbin dataset. The complete list of classes and their corresponding color codes are provided below.

Figure 6
Number of instances per category of annotated images. The same super category is formed by bars that share the same color.

Figure 7
Annotated photographs from the Forbin dataset. Each image displays polygons corresponding to different annotation categories, with transcribed text linked to each region. The examples highlight the diversity of text layouts, orientations, and typographies captured in the dataset.
