Archaeological Classification of Small Datasets Using Meta- and Transfer Learning Methods: A Case Study on Hittite Stele Fragments

Deniz Kayıkcı; Iban Berganzo-Besga; Juan Antonio Barceló

doi:10.5334/jcaa.196

Figures & Tables

Hittite cities in the scope of the study.

Dataset: training data on the left (28 samples for each class) and testing data on the right (6 samples for each class).

Table 1

Software libraries and programming environment.

LIBRARY	VERSION
Python	3.11
PyTorch	2.3.0+cu121
Seaborn	0.13.2
Scikit-learn	1.2.2
Matplotlib	3.10.0
Torchvision	0.18.0+cu121
CUDA	12.1
Higher	0.2.1
Evograd	0.1.2
Easyfsl	1.5.0
Scipy	1.15.2
NumPy	1.25.2
Pandas	2.2.2
Threadpoolctl	3.6.0
Joblib	1.4.2
Google Collaboration	A-100 GPU

Table 2

Parameters of the initial dataset implemented models.

HYPERPARAMETER	CONVENTIONAL ML MODELS	TRANSFER LEARNING (RESNET18)	HYBRID (MAML+FSL+RESNET18)	SIMPLE CNN + FSL
Architecture Specific
Base architecture	N/A	ResNet18 (pretrained)	ResNet18 (pretrained)	Custom CNN
Frozen layers	N/A	layer1, layer2	layer1, layer2	None
Dropout rates	N/A	0.5, 0.3	0.5, 0.3	0.5
Learning Process
Batch size	N/A	16	16	8
Number of epochs	N/A	30	20	100
Optimization
Optimizer	N/A	Adam	Adam	SGD
Learning rate	N/A	0.001	Multi-tier: 0.001 (fc), 0.0001 (layer4), 0.00001 (layer3)	Multi-tier: 0.001 (conv), 0.01 (fc)
Weight decay	N/A	0.001	0.001	0.01
Momentum	N/A	N/A	N/A	0.9
Meta-Learning
Meta learning rate	N/A	N/A	0.0005	N/A
Inner learning rate	N/A	N/A	0.01	N/A
Inner updates	N/A	N/A	3	N/A
Meta updates	N/A	N/A	100	N/A
Regularization
L2 lambda	N/A	N/A	0.001	0.001
Gradient clip norm	N/A	N/A	1.0	1.0
Early Stopping
Patience	N/A	5	5	10
Loss Function
Function type	Varies by model	CrossEntropyLoss	CrossEntropyLoss with class weights	Focal Loss
Focal Loss gamma	N/A	N/A	N/A	2
Learning Rate Scheduling
Scheduler	N/A	ReduceLROnPlateau	ReduceLROnPlateau	ReduceLROnPlateau
Schedule factor	N/A	0.5	0.5	0.5
Schedule patience	N/A	5	5	5
Model-Specific Parameters
SVM	kernel=‘rbf’, random_state=42	N/A	N/A	N/A
KNN	n_neighbors=5	N/A	N/A	N/A
Random Forest	n_estimators=100, random_state=42	N/A	N/A	N/A
Logistic Regression	multi_class=‘ovr’, random_state=42, max_iter=1000	N/A	N/A	N/A
Decision Tree	random_state=42	N/A	N/A	N/A
Naive Bayes	Default parameters	N/A	N/A	N/A
Validation
Cross-validation folds	5 (3 for enhanced vs)	3	3	N/A
Random seed	42	123	123	123

Table 3

Performance comparison of the models, averaged for the four classes in initial dataset: Sakçagözü, Alacahöyük, Karkamış, Arslantepe.

MODEL	TRAINING (VAL.%)	TEST%
Hybrid Model (MAML+FSL+ResNet18)	73.21	81.94
Transfer Learning (Only Pre-Trained Resnet18)	83.93	72.22
Simple CNN + FSL	58	44
Reference
Human Expert Prediction (Assoc. Prof. in Hittite Art)		62.50

Table 4

Conventional machine learning results, averaged for the four classes: Sakçagözü, Alacahöyük, Karkamış, Arslantepe. Corresponding confusion matrices are included in Appendix 2.

MODEL	TRAINING ACCURACY	TRAINING PRECISION	TRAINING F1 SCORE	TEST ACCURACY	TEST PRECISION	TEST F1 SCORE	AVERAGE CV SCORE
Support Vector Machines (SVM)	0.9167	0.9237	0.9160	0.3929	0.4008	0.3910	0.3433
K-Nearest Neighbors (KNN)	0.5370	0.6401	0.5430	0.2500	0.5094	0.2341	0.3147
Random Forest (RF)	1.0000	1.0000	1.0000	0.4286	0.4484	0.4277	0.3152
Logistic Regression (LR)	1.0000	1.0000	1.0000	0.5000	0.5893	0.5119	0.3970
Decision Tree (DT)	1.0000	1.0000	1.0000	0.2500	0.2989	0.2655	0.3325
Naive Bayes (NB)	0.7222	0.7616	0.7194	0.3571	0.3662	0.3449	0.3524

Performance metrics for each fold according to initial dataset implemented models.

Table 5

Summary of metrics across three folds for initial dataset implemented models.

METRIC	FOLD 1 (%)	FOLD 2 (%)	FOLD 3 (%)	AVERAGE (%)
Accuracy	87.5	85.2	88.1	86.9
Precision	78.6	80.3	79.8	79.6
Recall	81.3	79	80.5	80.3
F1 Score	79.9	79.6	80.1	79.9

Table 6

Performance metrics of initial dataset implemented hybrid model. Disaggregated results.

METRIC	SAKÇAGÖZÜ	ALACAHÖYÜK	KARKAMIÜ	ARSLANTEPE	AVERAGE
Training – Precision	0.811	0.586	0.861	0.779	0.759
Training – Recall	0.748	0.644	0.537	1.000	0.732
Training – F1-Score	0.770	0.604	0.649	0.876	0.725
Test – Precision	0.841	0.804	0.743	1.000	0.847
Test – Recall	0.778	0.833	0.778	0.889	0.820
Test – F1-Score	0.797	0.797	0.755	0.933	0.821

Table 7

Performance metrics of initial dataset implemented transfer learning model. Disaggregated results.

METRIC	SAKÇAGÖZÜ	ALACAHÖYÜK	KARKAMIÜ	ARSLANTEPE	AVERAGE
Training – Precision	0.9296	0.9259	0.7196	0.8187	0.93
Training – Recall	0.8963	0.8889	0.7148	0.8630	0.92
Training – F1-Score	0.9084	0.9063	0.7079	0.8283	0.92
Test – Precision	1	0.7222	0.5983	0.8056	0.84
Test – Recall	0.7778	0.8333	0.7778	0.50	0.83
Test – F1-Score	0.8586	0.7667	0.6253	0.5889	0.83

Table 8

Performance metrics of initial dataset implemented simple CNN + FSL. Disaggregated Results.

METRIC	SAKÇAGÖZÜ	ALACAHÖYÜK	KARKAMIÜ	ARSLANTEPE	AVERAGE
Training – Precision	0.63	0.54	0.75	0.45	0.59
Training – Recall	0.93	0.46	0.43	0.50	0.58
Training – F1-Score	0.75	0.50	0.55	0.47	0.56
Test – Precision	0.79	0.15	0.50	0.31	0.43
Test – Recall	0.92	0.17	0.33	0.33	0.43
Test – F1-Score	0.85	0.16	0.40	0.32	0.43

Table 9

Performance comparison between initial and enhanced datasets implemented models.

MODEL	INITIAL DATASET			ENHANCED DATASET			IMPROVEMENT
MODEL	TRAINING ACCURACY	TEST ACCURACY	CV SCORE	TRAINING ACCURACY	TEST ACCURACY	CV SCORE	TEST ACCURACY
Statistical
SVM	91.67%	39.29%	34.33%	98.68%	55.36%	45.32%	+16.07%
KNN	53.70%	25.00%	31.47%	55.92%	55.36%	44.08%	+30.36%
Random Forest	100.00%	42.86%	31.52%	100.00%	55.36%	45.32%	+12.50%
Logistic Regression	100.00%	50.00%	39.70%	100.00%	57.14%	48.01%	+7.14%
Decision Tree	100.00%	25.00%	33.25%	100.00%	33.93%	35.54%	+8.93%
Naive Bayes	72.22%	35.71%	35.24%	62.50%	44.64%	42.05%	+8.93%
ANNs Based
Simple CNN + FSL	58.00%	44.00%	N/A	90.44%	43.45%	N/A	–0.55%
Hybrid Model	73.21%	81.94%	N/A	93.43%	82.74%	N/A	+0.80%
Transfer Learning	83.93%	72.22%	N/A	83.58%	75.60%	N/A	+2.71%
Reference
Human Expert Prediction on Enhanced Dataset (Assoc. Prof. in Hittite Art)		62.50%			85.70%

Table 10

Class-Specific performance of testing on enhanced dataset.

CLASS	TRANSFER LEARNING (RESNET18)			HYBRID (MAML+FSL+RESNET18)
CLASS	PRECISION	RECALL	F1-SCORE	PRECISION	RECALL	F1-SCORE
Alacahöyük	0.61	0.78	0.68	1.00	0.79	0.88
Aslantepe	0.83	0.71	0.76	0.81	0.93	0.87
Karkamış	0.81	0.64	0.72	0.71	0.71	0.71
Sakçagözü	0.93	1	0.96	0.93	1.00	0.97
Average	0.79	0.78	0.78	0.86	0.86	0.86

Table 11

Comparative analysis for enhanced dataset predictions for each class on test.

CLASS	HUMAN EXPERT	TRANSFER LEARNING	HYBRID MODEL
Overall Accuracy	85.7%	75.6%	82.74%
Alacahöyük	12/14 (85.7%)	10/14 (71.4%)	11/14 (78.6%)
Aslantepe	11/14 (78.6%)	10/14 (71.4%)	13/14 (92.9%)
Karkamış	11/14 (78.6%)	9/14 (64.3%)	10/14 (71.4%)
Sakçagözü	14/14 (100%)	13/14 (92.9%)	14/14 (100%)

Grad-CAM, Guided Backpropagation, and Guided Grad-CAM applied on the trained model without the background class.

Table 12

Comparison of performance metrics of nested CV models with and without the background class.

	BEST TRAINING MEAN ACCURACY	TESTING MEAN ACCURACY
4-class model	0.9275	0.7778
5-class model	0.8059	0.6518

Grad-CAM, Guided Backpropagation, and Guided Grad-CAM applied on the trained model with the background class.

Archaeological Classification of Small Datasets Using Meta- and Transfer Learning Methods: A Case Study on Hittite Stele Fragments

Figures & Tables

Figure 1

Figure 2

Figure 3

Figure 4