
Figure 1
REPSASM System Architecture.

Figure 2
REPSASM Schema Matching Architecture.

Figure 3
Classifying a Column Using the Match Tree.
Table 1
Column mapping between CKAN and CERIF.
| A | is_source_of_has_classification_has_term |
| B | is_destination_of_has_source_is_source_of_has_destination_has_URI |
| C | is_dstination_of_has_source_is_source_of_has_destination_type |
| D | is_source_of_has_destination_type |
| E | is_destination_of_has_source_is_source_of_has_endDate |
| F | has_identifier_is_source_of_has_endDate |
| H | has_identifier_has_id_value |
| I | is_destination_of_has_source_has_identifier_has_URI |
| J | is_destination_of_has_source_has_identifier_type |
| K | is_destination_of_type |
| M | is_destination_of_has_source_is_source_of_has_destination_has_name |
| N | is_destination_of_has_source_type |
| O | is_destination_of_has_classification_type |
| P | has_identifier_has_URI |
| Q | is_source_of_has_classification_type |
| R | has_identifier_is_source_of_has_classification_type |
| S | is_destination_of_has_endDate |
| T | is_destination_of_has_startDate |
| V | is_destination_of_has_source_has_identifier_has_id_value |
| X | is_source_of_has_endDate |
| Y | has_identifier_is_source_of_has_startDate |
| a | is_destination_of_has_source_is_source_of_has_classification_type |
| b | is_destination_of_has_source_is_source_of_type |
| c | is_destination_of_has_source_is_source_of_has_startDate |
| d | has_identifier_is_source_of_type |
| e | is_source_of_has_startDate |
| has_description | has_description |
| has_identifier_label | has_identifier_label |
| has_identifier_type | has_identifier_type |
| has_name | has_name |
| is_source_of_type | is_source_of_type |
| label | label |
| type | type |
| unknown | unknown |

Figure 4
Number of instances per class in the CERIF learn set.

Figure 5
Number of instances per class in the CKAN Test set.

Figure 6
Inlier test on the CKAN-CERIF dataset. Accuracy was averaged over 5 tests with 31 classes. Number of simulated columns per class: 15.

Figure 7
Outlier Test on the CKAN-CERIF dataset. Scores were averaged over 5 tests with 31 classes. Number of simulated columns per class: 15.

Figure 8
Finalized Fitted Pipeline for the CKAN-CERIF dataset.

Figure 9
Inlier test on the CKAN-CERIF dataset. Accuracy was averaged over 3 tests.

Figure 10
Outlier test on the CKAN-CERIF dataset. Scores were averaged over 3 tests.
