Deep Learning-Based Computer Vision Is Not Yet the Answer to Taphonomic Equifinality in Bone Surface Modifications

Lloyd Austin Courtenay; Nicolas Vanderesse; Luc Doyon; Antoine Souron

doi:10.5334/jcaa.145

Figures & Tables

Table 1

Description of the DL-based CV algorithms experimented with in the present study. This includes a summary of the number of parameters (millions) in each algorithm if trained from scratch, the number of convolutional layers (Nº Conv), whether Batch Normalisation (B.N.) layers are included in the algorithm, and whether pretrained weights are available in TensorFlow for the purpose of transfer learning. The DS columns indicate which of the three datasets in this study were originally used to train algorithms by Abellá, Baquedano & Domínguez-Rodrigo (2022) (DS1), Domínguez-Rodrigo et al., (2020) (DS2), Pizarro-Monzo et al. (2023) (DS3). The reference list refers to the original publication of each of the datasets. * This number includes all of the convolutional layers included within the inception modules – large blocks of multiple convolutional layers. ** A precise number cannot be reported as the architecture works differently from the others.

ALGORITHM	PARAMS.	Nº CONV	B.N.	PRETRAINED	DS			REFERENCES
ALGORITHM	PARAMS.	Nº CONV	B.N.	PRETRAINED	1	2	3	REFERENCES
Jason1	≈ 3 Mil.	8	False	False		x		Brownlee, 2017
Jason2	≈ 3 Mil.	8	True	False	x	x		Brownlee, 2017
VGG16	≈ 15 Mil.	13	False	True		x		Simonyan & Zisserman, 2015
DenseNet201	≈ 18 Mil.	201	True	True	x	x	x	Huang et al., 2017
VGG19	≈ 20 Mil.	16	False	True	x		x	Simonyan & Zisserman, 2015
InceptionV3	≈ 22 Mil.	96 *	True	True		x		Szegedy et al., 2015
ResNet50	≈ 24 Mil.	49	True	True	x	x	x	He et al., 2016
Alexnet	≈ 35 Mil.	5	True	False		x		Krizhevsky et al., 2012
EfficientNetB7	≈ 64 Mil.	**	True	True	x			Tan and Le, 2020

**Examples of ideal train/validation learning curves.** These curves were obtained from a neural network trained on a toy dataset, alongside examples of both underfitting and overfitting neural networks.

Table 2

Descriptive statistics of the different metrics obtained from analysed images. Descriptive statistics of the different metrics extracted from images from the different classes present in DS1 and DS3; DS2 was excluded from the table as it is contained within DS3. Metrics include the Laplacian of Gaussian (LoG) variance, percentage of image presenting detectable features using Canny Edge Detection (CED), Fast Fourier Transform (FFT) magnitudes, the percentage of images presenting adequate levels of contrast, and the percentage of images presenting complications due to the presence of specularities (Spec.). Descriptive statistics report the central tendency followed by 95% confidence intervals constructed using distribution quantiles. CM = Cut Mark, Croc. = Crocodylian tooth score, TM = Carnivoran Tooth Mark, Tmp = Trampling.

SAMPLE	LOG VARIANCE	FFT MAGNITUDE	CED (%)	CONTRAST (%)	SPEC. (%)
DS1-CM	20.6 +/– [13.8, 55.6]	9.1 +/– [0.6, 24.1]	33.4 +/– [16.3, 56.7]	45.0	11.3
DS1-Croc.	71.0 +/– [17.8, 230.8]	19.0 +/– [5.7, 30.7]	49.9 +/– [26.8, 65.7]	95.7	58.7
DS3-CM	22.1 +/– [14.1, 70.2]	8.7 +/– [0.7, 22.9]	35.7 +/– [16.6, 66.3]	29.6	9.8
DS3-TM	41.9 +/– [13.6, 133.3]	16.2 +/– [–0.2, 28.0]	48.3 +/– [14.6, 68.2]	80.9	62.1
DS3-Tmp	44.1 +/– [10.2, 378.4]	9.7 +/– [–6.7, 93.2]	45.2 +/– [2.7, 87.8]	47.3	40.0

**Examples of images displaying exceptionally poor quality.** Examples of photographs of cut marks from DS1 and DS3 presenting especially poor image quality, with a considerable portion of pixels out-of-focus towards the image border. Sobel gradient maps in the right-hand panels highlight these features with sharp changes in gradient being clearly visible in the centre of each image, while gradients towards the edges present a high degree of out-of-focus blur with almost no detectable features.

**Examples of photographs presenting specular reflections.** Examples of photographs of tooth marks from DS1 and DS3, presenting areas of abnormally intense brightness in certain pixels as a product of specular reflections. Panels on the right present pixels where these abnormalities have been detected.

Table 3

Evaluation metrics when predicting the different types of BSMs using the models trained on each dataset and evaluated on the test set. Note that a high accuracy value does not necessarily imply positive performance, as evidenced by the (imbalanced) DS1 dataset.

	DS1	DS2	DS3
Precision	0.46	0.77	0.88
Recall	0.50	0.69	0.87
F1	0.48	0.66	0.88
Accuracy	0.92	0.86	0.91

Table 4

Error rates (in %) when predicting the different types of BSMs using the models trained on each dataset and evaluated on the test sets. Error rates are reported as the RMSE of the labels.

	DS1	DS2	DS3
Tooth Score	–	15.34	7.95
Trampling	–	55.30	22.35
Cut Mark	5.29	10.14	7.90
Crocodile	24.41	–	–
Overall	8.63	15.13	9.77

Table 5

Confusion matrix obtained when evaluating a Jason2 model trained on the DS1 test set. Note that the confusion matrix presents a true positive rate of 0; the algorithm classifies all samples as cut marks regardless of whether they are or not.

		PREDICTED
		CROCODILE	CUT MARK
True	Crocodile	0	13
	Cut Mark	0	146

**Empirical learning curves for neural networks.** Learning curves obtained from the best performing convolutional neural network architectures on each of the datasets; Jason2 for DS1, VGG16 for DS2, and DenseNet201 for DS3.

Table 6

Confusion matrix obtained when evaluating VGG16 on the test set of DS2 and DenseNet201 on the test set of DS3.

		PREDICTED DS2			PREDICTED DS3
		CUT MARK	SCORE	TRAMPLING	CUT MARK	SCORE	TRAMPLING
True	Cut Mark	134	11	1	163	2	6
	Score	2	28	0	7	126	3
	Trampling	1	13	4	1	11	33

**Grad-CAM results displaying suboptimal detection of features.** Grad-CAM results for a selection of images displaying particularly poor identification of relevant features for BSM classification. The lighter shades of yellow highlight areas where the CNN are identifying notable features. Darker areas leaning more towards blue indicate areas that are not of interest to the CNN when identifying each type of BSM.

Deep Learning-Based Computer Vision Is Not Yet the Answer to Taphonomic Equifinality in Bone Surface Modifications

Figures & Tables

Table 1

Figure 1

Table 2

Figure 2

Figure 3

Table 3

Table 4

Table 5

Figure 4

Table 6

Figure 5

Paradigm

My account