Table 1
Layout statistics for the main categories. Other refers to all remaining categories, primarily including empty images or images that do not contain migration records (due to incorrect metadata or mixed data within a book).
| LAYOUT TYPE | IMAGES | % OF DATA |
|---|---|---|
| handdrawn | 94,477 | 47.07% |
| preprinted | 79,243 | 39.48% |
| half-table | 14,162 | 7.06% |
| free text | 9,457 | 4.71% |
| other | 3,395 | 1.69% |

Figure 1
Cumulative counts for different preprinted layout types.

Figure 2
Example of handdrawn moving table (Huittinen 1878). FFHA’s digital archive.

Figure 3
Example of preprinted moving table (Heinävesi 1909). FFHA’s digital archive.
Table 2
Typical elements of migration tables.
| DATA | DESCRIPTION |
|---|---|
| Reference number | An identifier for the record, which may represent a page reference, an order number within a specific year, or other context-dependent information. |
| Date | Date of the recording, not necessarily the actual moving date. |
| Occupation and name | Name of the person or main person of the family and his/her occupation. |
| Number of persons | Number of people moving, females and males separated. |
| Where to/Where from | Name of the new/old parish depending if moving-in or moving-out. |
| Reference to communion book | Reference to the page in the communion book where other details about the person are recorded. |
| Notes | Other related markings. |

Figure 4
Details of a typical moving table entry from Hankasalmi: Maria Sirkka, a servant (piika), moved to Rautalampi on January 9th. She is female (naisenpuoli), born on March 25, 1857, in Rautalampi. Her marital status is single, and her occupation is servant (palvelus). Additional information can be found in the communion book on page 296. No further remarks are recorded.
Table 3
Summary of manually annotated data for different stages of the pipeline, divided into training, development, and test sets. Image and cell counts are shown separately.
| ANNOTATION TYPE | TRAIN | DEV | TEST | TOTAL | ||||
|---|---|---|---|---|---|---|---|---|
| IMAGES | CELLS | IMAGES | CELLS | IMAGES | CELLS | IMAGES | CELLS | |
| De-skew key points | 900 | – | 190 | – | 200 | – | 1,290 | – |
| Table structure | 1,252 | – | 188 | – | 192 | – | 1,632 | – |
| Cell type | 230 | 47,000 | 47 | 14,000 | 46 | 16,000 | 323 | 77,000 |
| Text recognition | – | – | 41 | 1,947 | 39 | 2,277 | 80 | 4,224 |
| Year recognition | 1,026 | – | 188 | – | 192 | – | 1,326 | – |

Figure 5
Text recognition for tabular data.

Figure 6
Extreme example of page skew (left) and the output of the de-skew process (right). Red circles mark stage-I corner recognition, green dots mark stage-II corner recognition.

Figure 7
De-skew process. Two pages of the opening with the relevant six keypoints A-F and the image frame (dashed line).

Figure 8
Example of how clustering improves results. In some cases, the table cell detection model fails to detect all cells in a table (black circles on the left-hand side). By applying a clustering method, these gaps can be filled (black circles on the right-hand side).

Figure 9
Example of an opening with several year mentions outside of the header area.
Table 4
Skew angle, in degrees difference from vertical, of the left, middle, and right borders, reported on the test set. The angles in the original image (Base) are calculated using the manual annotation of the test set images, and Stage I and II are the two stages of the de-skew algorithm.
| LEFT | MIDDLE | RIGHT | |
|---|---|---|---|
| Base | 0.33° ± 0.78 | –0.06° ± 0.53 | –0.28° ± 0.77 |
| Stage I | 0.17° ± 0.62 | –0.15° ± 0.43 | –0.27° ± 0.64 |
| Stage II | 0.08° ± 0.59 | 0.06° ± 0.82 | –0.005° ± 0.69 |
Table 5
Table detection.
| TABLE TYPE | ACCURACY | RECALL | PRECISION | F1-SCORE |
|---|---|---|---|---|
| Preprinted | 93.2 | 93.2 | 100.0 | 96.5 |
| Handdrawn | 95.4 | 95.4 | 100.0 | 97.6 |
| All | 94.2 | 94.2 | 100.0 | 97.0 |
Table 6
Row detection.
| TABLE TYPE | ACCURACY | RECALL | PRECISION | F1-SCORE |
|---|---|---|---|---|
| Preprinted | 95.1 | 96.4 | 98.7 | 97.5 |
| Handdrawn | 87.9 | 93.7 | 93.4 | 93.6 |
| All | 91.4 | 95.1 | 96.0 | 95.5 |
Table 7
Column detection.
| TABLE TYPE | ACCURACY | RECALL | PRECISION | F1-SCORE |
|---|---|---|---|---|
| Preprinted | 96.1 | 99.1 | 96.9 | 98.0 |
| Handdrawn | 92.4 | 98.3 | 93.9 | 96.1 |
| All | 94.4 | 98.7 | 95.6 | 97.1 |
Table 8
Cell type classification performance with Precision, Recall, and F1-score reported separately for class.
| CELL TYPE | PRECISION | RECALL | F1-SCORE | SUPPORT |
|---|---|---|---|---|
| single-line | 96.3 | 87.3 | 91.6 | 9829 |
| empty | 81.2 | 96.7 | 88.3 | 3692 |
| repetition | 79.4 | 87.1 | 83.1 | 2020 |
| multi-line | 67.9 | 69.6 | 68.7 | 744 |
| accuracy | 88.6 | 16285 | ||
| macro avg | 81.2 | 85.2 | 82.9 | 16285 |
| weighted avg | 89.5 | 88.6 | 88.8 | 16285 |
Table 9
Comparison of text recognition evaluation for numeric and textual lines.
| EM | CER | AVG. LENGTH | SUPPORT | |
|---|---|---|---|---|
| textual | 28.2% | 0.19 | 12.2 chars | 897 |
| numeric | 65.8% | 0.18 | 3.2 chars | 1,232 |
| All | 49.9% | 0.19 | 7.0 chars | 2,129 |
Table 10
Precision, Recall, and F1-score of per-page year mention extraction.
| YEAR EXTRACTION METHOD | PRECISION | RECALL | F1-SCORE |
|---|---|---|---|
| with LLM correction | 91.6 | 83.1 | 87.2 |
| without LLM correction | 89.2 | 80.0 | 84.4 |
Table 11
Proportion of extracted parish names in the Elimäki books for which a known parish name can be found at edit distance of at most d.
| d = 0 | d ≤ 1 | d ≤ 2 | d ≤ 3 | d ≤ 4 |
|---|---|---|---|---|
| 8% | 23% | 41% | 60% | 72% |

Figure 10
Histograms of departures from and arrivals to Elimäki between 1875 and 1922.

Figure 11
Maps showing the origins and destinations of migration to and from Elimäki between 1875 and 1922.
