
Figure 1
Hittite cities in the scope of the study.

Figure 2
Cropped part sample of images.

Figure 3
Data transformation pipeline.

Figure 4
Dataset: training data on the left (28 samples for each class) and testing data on the right (6 samples for each class).
Table 1
Software libraries and programming environment.
| LIBRARY | VERSION |
|---|---|
| Python | 3.11 |
| PyTorch | 2.3.0+cu121 |
| Seaborn | 0.13.2 |
| Scikit-learn | 1.2.2 |
| Matplotlib | 3.10.0 |
| Torchvision | 0.18.0+cu121 |
| CUDA | 12.1 |
| Higher | 0.2.1 |
| Evograd | 0.1.2 |
| Easyfsl | 1.5.0 |
| Scipy | 1.15.2 |
| NumPy | 1.25.2 |
| Pandas | 2.2.2 |
| Threadpoolctl | 3.6.0 |
| Joblib | 1.4.2 |
| Google Collaboration | A-100 GPU |
Table 2
Parameters of the initial dataset implemented models.
| HYPERPARAMETER | CONVENTIONAL ML MODELS | TRANSFER LEARNING (RESNET18) | HYBRID (MAML+FSL+RESNET18) | SIMPLE CNN + FSL |
|---|---|---|---|---|
| Architecture Specific | ||||
| Base architecture | N/A | ResNet18 (pretrained) | ResNet18 (pretrained) | Custom CNN |
| Frozen layers | N/A | layer1, layer2 | layer1, layer2 | None |
| Dropout rates | N/A | 0.5, 0.3 | 0.5, 0.3 | 0.5 |
| Learning Process | ||||
| Batch size | N/A | 16 | 16 | 8 |
| Number of epochs | N/A | 30 | 20 | 100 |
| Optimization | ||||
| Optimizer | N/A | Adam | Adam | SGD |
| Learning rate | N/A | 0.001 | Multi-tier: 0.001 (fc), 0.0001 (layer4), 0.00001 (layer3) | Multi-tier: 0.001 (conv), 0.01 (fc) |
| Weight decay | N/A | 0.001 | 0.001 | 0.01 |
| Momentum | N/A | N/A | N/A | 0.9 |
| Meta-Learning | ||||
| Meta learning rate | N/A | N/A | 0.0005 | N/A |
| Inner learning rate | N/A | N/A | 0.01 | N/A |
| Inner updates | N/A | N/A | 3 | N/A |
| Meta updates | N/A | N/A | 100 | N/A |
| Regularization | ||||
| L2 lambda | N/A | N/A | 0.001 | 0.001 |
| Gradient clip norm | N/A | N/A | 1.0 | 1.0 |
| Early Stopping | ||||
| Patience | N/A | 5 | 5 | 10 |
| Loss Function | ||||
| Function type | Varies by model | CrossEntropyLoss | CrossEntropyLoss with class weights | Focal Loss |
| Focal Loss gamma | N/A | N/A | N/A | 2 |
| Learning Rate Scheduling | ||||
| Scheduler | N/A | ReduceLROnPlateau | ReduceLROnPlateau | ReduceLROnPlateau |
| Schedule factor | N/A | 0.5 | 0.5 | 0.5 |
| Schedule patience | N/A | 5 | 5 | 5 |
| Model-Specific Parameters | ||||
| SVM | kernel=‘rbf’, random_state=42 | N/A | N/A | N/A |
| KNN | n_neighbors=5 | N/A | N/A | N/A |
| Random Forest | n_estimators=100, random_state=42 | N/A | N/A | N/A |
| Logistic Regression | multi_class=‘ovr’, random_state=42, max_iter=1000 | N/A | N/A | N/A |
| Decision Tree | random_state=42 | N/A | N/A | N/A |
| Naive Bayes | Default parameters | N/A | N/A | N/A |
| Validation | ||||
| Cross-validation folds | 5 (3 for enhanced vs) | 3 | 3 | N/A |
| Random seed | 42 | 123 | 123 | 123 |
Table 3
Performance comparison of the models, averaged for the four classes in initial dataset: Sakçagözü, Alacahöyük, Karkamış, Arslantepe.
| MODEL | TRAINING (VAL.%) | TEST% |
|---|---|---|
| Hybrid Model (MAML+FSL+ResNet18) | 73.21 | 81.94 |
| Transfer Learning (Only Pre-Trained Resnet18) | 83.93 | 72.22 |
| Simple CNN + FSL | 58 | 44 |
| Reference | ||
| Human Expert Prediction (Assoc. Prof. in Hittite Art) | 62.50 | |
Table 4
Conventional machine learning results, averaged for the four classes: Sakçagözü, Alacahöyük, Karkamış, Arslantepe. Corresponding confusion matrices are included in Appendix 2.
| MODEL | TRAINING ACCURACY | TRAINING PRECISION | TRAINING F1 SCORE | TEST ACCURACY | TEST PRECISION | TEST F1 SCORE | AVERAGE CV SCORE |
|---|---|---|---|---|---|---|---|
| Support Vector Machines (SVM) | 0.9167 | 0.9237 | 0.9160 | 0.3929 | 0.4008 | 0.3910 | 0.3433 |
| K-Nearest Neighbors (KNN) | 0.5370 | 0.6401 | 0.5430 | 0.2500 | 0.5094 | 0.2341 | 0.3147 |
| Random Forest (RF) | 1.0000 | 1.0000 | 1.0000 | 0.4286 | 0.4484 | 0.4277 | 0.3152 |
| Logistic Regression (LR) | 1.0000 | 1.0000 | 1.0000 | 0.5000 | 0.5893 | 0.5119 | 0.3970 |
| Decision Tree (DT) | 1.0000 | 1.0000 | 1.0000 | 0.2500 | 0.2989 | 0.2655 | 0.3325 |
| Naive Bayes (NB) | 0.7222 | 0.7616 | 0.7194 | 0.3571 | 0.3662 | 0.3449 | 0.3524 |

Figure 5
Performance metrics for each fold according to initial dataset implemented models.
Table 5
Summary of metrics across three folds for initial dataset implemented models.
| METRIC | FOLD 1 (%) | FOLD 2 (%) | FOLD 3 (%) | AVERAGE (%) |
|---|---|---|---|---|
| Accuracy | 87.5 | 85.2 | 88.1 | 86.9 |
| Precision | 78.6 | 80.3 | 79.8 | 79.6 |
| Recall | 81.3 | 79 | 80.5 | 80.3 |
| F1 Score | 79.9 | 79.6 | 80.1 | 79.9 |
Table 6
Performance metrics of initial dataset implemented hybrid model. Disaggregated results.
| METRIC | SAKÇAGÖZÜ | ALACAHÖYÜK | KARKAMIÜ | ARSLANTEPE | AVERAGE |
|---|---|---|---|---|---|
| Training – Precision | 0.811 | 0.586 | 0.861 | 0.779 | 0.759 |
| Training – Recall | 0.748 | 0.644 | 0.537 | 1.000 | 0.732 |
| Training – F1-Score | 0.770 | 0.604 | 0.649 | 0.876 | 0.725 |
| Test – Precision | 0.841 | 0.804 | 0.743 | 1.000 | 0.847 |
| Test – Recall | 0.778 | 0.833 | 0.778 | 0.889 | 0.820 |
| Test – F1-Score | 0.797 | 0.797 | 0.755 | 0.933 | 0.821 |
Table 7
Performance metrics of initial dataset implemented transfer learning model. Disaggregated results.
| METRIC | SAKÇAGÖZÜ | ALACAHÖYÜK | KARKAMIÜ | ARSLANTEPE | AVERAGE |
|---|---|---|---|---|---|
| Training – Precision | 0.9296 | 0.9259 | 0.7196 | 0.8187 | 0.93 |
| Training – Recall | 0.8963 | 0.8889 | 0.7148 | 0.8630 | 0.92 |
| Training – F1-Score | 0.9084 | 0.9063 | 0.7079 | 0.8283 | 0.92 |
| Test – Precision | 1 | 0.7222 | 0.5983 | 0.8056 | 0.84 |
| Test – Recall | 0.7778 | 0.8333 | 0.7778 | 0.50 | 0.83 |
| Test – F1-Score | 0.8586 | 0.7667 | 0.6253 | 0.5889 | 0.83 |
Table 8
Performance metrics of initial dataset implemented simple CNN + FSL. Disaggregated Results.
| METRIC | SAKÇAGÖZÜ | ALACAHÖYÜK | KARKAMIÜ | ARSLANTEPE | AVERAGE |
|---|---|---|---|---|---|
| Training – Precision | 0.63 | 0.54 | 0.75 | 0.45 | 0.59 |
| Training – Recall | 0.93 | 0.46 | 0.43 | 0.50 | 0.58 |
| Training – F1-Score | 0.75 | 0.50 | 0.55 | 0.47 | 0.56 |
| Test – Precision | 0.79 | 0.15 | 0.50 | 0.31 | 0.43 |
| Test – Recall | 0.92 | 0.17 | 0.33 | 0.33 | 0.43 |
| Test – F1-Score | 0.85 | 0.16 | 0.40 | 0.32 | 0.43 |
Table 9
Performance comparison between initial and enhanced datasets implemented models.
| MODEL | INITIAL DATASET | ENHANCED DATASET | IMPROVEMENT | ||||
|---|---|---|---|---|---|---|---|
| TRAINING ACCURACY | TEST ACCURACY | CV SCORE | TRAINING ACCURACY | TEST ACCURACY | CV SCORE | TEST ACCURACY | |
| Statistical | |||||||
| SVM | 91.67% | 39.29% | 34.33% | 98.68% | 55.36% | 45.32% | +16.07% |
| KNN | 53.70% | 25.00% | 31.47% | 55.92% | 55.36% | 44.08% | +30.36% |
| Random Forest | 100.00% | 42.86% | 31.52% | 100.00% | 55.36% | 45.32% | +12.50% |
| Logistic Regression | 100.00% | 50.00% | 39.70% | 100.00% | 57.14% | 48.01% | +7.14% |
| Decision Tree | 100.00% | 25.00% | 33.25% | 100.00% | 33.93% | 35.54% | +8.93% |
| Naive Bayes | 72.22% | 35.71% | 35.24% | 62.50% | 44.64% | 42.05% | +8.93% |
| ANNs Based | |||||||
| Simple CNN + FSL | 58.00% | 44.00% | N/A | 90.44% | 43.45% | N/A | –0.55% |
| Hybrid Model | 73.21% | 81.94% | N/A | 93.43% | 82.74% | N/A | +0.80% |
| Transfer Learning | 83.93% | 72.22% | N/A | 83.58% | 75.60% | N/A | +2.71% |
| Reference | |||||||
| Human Expert Prediction on Enhanced Dataset (Assoc. Prof. in Hittite Art) | 62.50% | 85.70% | |||||
Table 10
Class-Specific performance of testing on enhanced dataset.
| CLASS | TRANSFER LEARNING (RESNET18) | HYBRID (MAML+FSL+RESNET18) | ||||
|---|---|---|---|---|---|---|
| PRECISION | RECALL | F1-SCORE | PRECISION | RECALL | F1-SCORE | |
| Alacahöyük | 0.61 | 0.78 | 0.68 | 1.00 | 0.79 | 0.88 |
| Aslantepe | 0.83 | 0.71 | 0.76 | 0.81 | 0.93 | 0.87 |
| Karkamış | 0.81 | 0.64 | 0.72 | 0.71 | 0.71 | 0.71 |
| Sakçagözü | 0.93 | 1 | 0.96 | 0.93 | 1.00 | 0.97 |
| Average | 0.79 | 0.78 | 0.78 | 0.86 | 0.86 | 0.86 |
Table 11
Comparative analysis for enhanced dataset predictions for each class on test.
| CLASS | HUMAN EXPERT | TRANSFER LEARNING | HYBRID MODEL |
|---|---|---|---|
| Overall Accuracy | 85.7% | 75.6% | 82.74% |
| Alacahöyük | 12/14 (85.7%) | 10/14 (71.4%) | 11/14 (78.6%) |
| Aslantepe | 11/14 (78.6%) | 10/14 (71.4%) | 13/14 (92.9%) |
| Karkamış | 11/14 (78.6%) | 9/14 (64.3%) | 10/14 (71.4%) |
| Sakçagözü | 14/14 (100%) | 13/14 (92.9%) | 14/14 (100%) |

Figure 6
Grad-CAM, Guided Backpropagation, and Guided Grad-CAM applied on the trained model without the background class.
Table 12
Comparison of performance metrics of nested CV models with and without the background class.
| BEST TRAINING MEAN ACCURACY | TESTING MEAN ACCURACY | |
|---|---|---|
| 4-class model | 0.9275 | 0.7778 |
| 5-class model | 0.8059 | 0.6518 |

Figure 7
Grad-CAM, Guided Backpropagation, and Guided Grad-CAM applied on the trained model with the background class.
