Have a personal or library account? Click to login
Probabilistic Models for Predicting Archaeological Site Locations in Marj Bisri (Southern Lebanon): Comparing Frequency Ratio and Shannon’s Entropy Cover

Probabilistic Models for Predicting Archaeological Site Locations in Marj Bisri (Southern Lebanon): Comparing Frequency Ratio and Shannon’s Entropy

Open Access
|Feb 2025

Full Article

1 Introduction

To address the pressing water scarcity issue in Lebanon, the Lebanese government initiated the Greater Beirut Water Supply Augmentation Project, which involves constructing a dam in the Bisri valley in Southern Lebanon. The objective of this project is to enhance water availability in the Greater Beirut and Mount Lebanon Districts (Dar Al-Handasah 2014). However, it is widely recognized that the construction of large dams in river basins poses a significant threat to cultural heritage, as it inevitably leads to the permanent destruction of archaeological sites. River basins have historically served as suitable ecosystems for human settlements (Brandt & Hassan 2000), and the Bisri valley is no exception, with archaeological evidence indicating continuous human occupation as early as the Palaeolithic Period (see below). This study aims to conduct a desk-based assessment, a form of comprehensive review of existing information of the archaeological potential in the Bisri valley, providing an effective tool for mitigation strategies to prevent the loss of archaeological sites and minimize the impact on cultural heritage.

The use of archaeological predictive modelling (APM) has gained prominence in cultural resource management in recent decades (Verhagen & Whitley 2020). Two approaches have been employed to develop these models: inductive and deductive (Kamermans & Wansleeben 1999). Most models are deterministic, where geoenvironmental characteristics play a significant role in predicting site locations (Verhagen & Whitley 2020). However, the integration of socio-cultural variables, such as proximity to communication networks and religious landmarks, has not always been incorporated due to a lack of relevant data (Verhagen et al. 2010; Verhagen et al. 2012). Commonly used probabilistic methods for generating inductive APM include binary logistic regression (Abou Diwan 2020), weight of evidence (Perakis & Moysiadis 2011) and random forest (Castiello & Tonini 2021). In this study, we have adopted an inductive approach to establish our models, utilizing statistical analysis of the distribution of known archaeological sites in relation to selected geo-environmental variables such as slope and soil composition (Van Leusen et al. 2005). The susceptibility of archaeological site occurrence in the Bisri Valley, particularly within the boundaries of the dam reservoir, was assessed using frequency ratio widely employed in Landslide Susceptibility Assessment (Akgun, Dag & Bulut 2008; Mezughi et al. 2011; Silalahi, Arifianti & Hidayat 2019) but less commonly used in archaeological predictive modelling. This method consists of assigning grades based on the ratio of observed archaeological sites to each variable class’s spatial extent (Abou Diwan 2020; Aubry, Luís & Dimuccio 2012; Nicu, Mihu-Pintilie & Williamson 2019). The second method, Shannon’s Entropy, commonly utilized for assessing natural hazard risks such as floods and landslides (Haghizadeh et al. 2017; Bednarik et al. 2010; Constantin et al. 2011; Nohani et al. 2019; Pourghasemi, Mohammady & Pradhan 2012; Yang & Qiao 2010), has not yet been employed in archaeological predictive modelling. This approach aids in identifying environmental factors that significantly influence the occurrence of natural hazards (Bednarik et al. 2010; Roodposhti et al. 2016) could also be employed to assess the occurrence of archaeological sites in relation to environmental variables.

While a hybrid approach that combines both inductive and deductive models is often the most effective (Kvamme 2005; Verhagen 2007), the choice to use an inductive model in this case was driven by the lack of well-defined dating for the archaeological sites, as well as an absence of established theoretical understanding of site distribution patterns or historical foundations for predicting site locations in this area which limits the applicability of deductive models. As a result, the core focus of this paper is to assess whether the two inductive techniques employed in this paper yield consistent results in identifying areas of high archaeological potential highlighting similar regions of interest. The purpose is to provide a rational foundation for the archaeological survey design, meant to be conducted prior to the construction of the projects.

2 Description of the Study Area

The study area is located in South Lebanon 35 km southeast of Beirut and about 15 km inland from the Mediterranean coast. This area is characterized by a mean annual average temperature of 15.5 degrees (Service météorologique du Liban 1977) and a mean annual relative humidity of 62% (Service météorologique du Liban 1982). The climate is characteristic of the Eastern Mediterranean region with a dry summer and a rainy season extending from November to April. The annual precipitation is 1250 mm and the prevailing wind is from the south-west (Garrad 2011). Coniferous forests are the major land cover type in this area followed by field crops, fallow land, grassland, and forbs (FAO 1990). The geological formations in this zone are mainly made of the Jurassic Kesrouane Limestone (J4), Cretaceous Sannine Limestone (C4), and Quaternary deposit consisting of alluvial and fluvial fans exposed along the course of the Bisri river (Dubertret 1955). The Bisri river forms the main water supply in this area receiving water from various perennial and seasonal streams with Nahr el Barouk, and Nahr ‘Aray as the main tributaries. Within the Nahr Bisri valley a water dam is expected to be constructed between the Iklim-el-Kharoub, Barouk-Niha, and Jezzine mountains with a reservoir extending about 4 km upstream of the dam axis and a storage capacity of 125 Mm3 covering a surface of 434 ha (Dar Al-Handasah 2014). The study area was determined using watershed modelling techniques. Watershed delineation was performed utilizing the Watershed tool in ArcGIS Pro 3.4.0 (Esri 2024), based on the spatial extent enclosed by the smallest rectangle that includes all recorded archaeological sites from previous surveys in the Bisri Valley (see below). The study area covers 48 km2, with elevations ranging from 139 m to 1,409 m (Figure 1).

jcaa-8-1-150-g1.png
Figure 1

Map showing the study area.

3 Material and Methods

3.1 Archaeological and geospatial data

The establishment of the archaeological predictive model for the study area is based on the data provided by the archaeological survey conducted by The Polish Centre for Mediterranean Archaeology at the University of Warsaw between 2004 and 2005 (Jakubiak 2011; Jakubiak & Neska 2005; 2007). The data is publicly available in the Environmental and Social Impact Assessment of the Greater Beirut Water Augmentation Project report (Dar Al-Handasah 2014). Seventy-five sites were recorded during these campaigns spanning from the Palaeolithic through to the present day, 67 of which were incorporated into the present study. Seventeen sites fall within the area of the dam reservoir, 13 sites within a 100 m distance and 25 further sites within a 1 km distance. The remaining 12 sites lie within a distance ranging between 1 and 6 km from the project dam reservoir boundaries (Table 1). The Bisri valley and its surroundings offer a suitable environment for human settlement based on the recorded archaeological sites. The archaeological evidence points to an extensive occupation during the Roman period with 17 sites. One of the major landmarks during the period are the drums of four grey granite columns, which seem to have belonged to a tetrastyle prostyle temple located at the crossing point of the two rivers, the Nahr el-Barouk and the Nahr ‘Aray which form the Nahr el-Awwali (Boulanger 1955; Khalil 2009; Tallon 1967). According to Aliquot (2009), the dimension of the portico (14.70 m) is equivalent to that of the largest rural temples of Lebanon. The existence of a rural settlement connected to the temple is an assumption also considered by Aliquot (2009). The temple is believed to have been dedicated to Eshmoun, the chief deity of Sidon as argued by Khalil (2009; 2015). Evidence of a road network connecting the area of Marj Bisri to the Bekaa is reported by Tallon (1967) and further evidence is provided by Khalil (2015). Since the surveyors did not specify the surface area of each recorded sites. The spatial extent for each site was determined by a bounding square (115.35 × 115.35 m) corresponding to 16 pixels, each matching the spatial resolution of the ALOS Prism 30 (AW3D30) (JAXA, n.d.) This method is particularly effective for overlaying archaeological site surfaces onto geo-environmental raster data. It enables the extraction of values from the centres of 16 raster cells in overlapping areas of archaeological sites using the “extract value to table” function in ArcGIS Pro (Nicu, Mihu-Pintilie & Williamson 2019). The total surface area of the 67 archaeological sites recorded within the study area amounts to 1,032 pixels (476,175 m2), instead of 1,072 pixels (510,459 m2), due to the overlapping spatial extents of some sites given their proximity (Table 1).

Table 1

List of the archaeological sites recorded during the Polish survey (after Dar Al-Handasah 2014: 95–96). The coordinates were converted to UTM Zone 36 N.

SITE NO.NATURE OF FINDEASTINGNORTHINGDISTANCE TO DAM RESERVOIRSIZE IN PIXEL
1Pottery, wall, olive crusher, Persian-Roman736235.9853719664.595within16
2Pottery, wall, olive crusher, Persian-Roman736392.95733719806.288within16
3Pottery pieces, Recent736459.46583719724.69within16
4(Unrecorded)736668.38943719936.384within16
5(Unrecorded)736764.92493719997.337within16
6Pottery pieces, Roman737403.92313720552.60759 m16
7Regular stone blocks, Undated737417.01863720753.315101 m16
8Pottery, Undated737941.11233720528.91within16
9House, C20739766.28723720790.118within16
10Village, C19–20738557.13863720966.548116 m16
11Pottery, glazed, C19–20739308.33013720118.956within16
12Pottery, C19–20, one fragment C2BC739753.39743720996.35161 m16
13House, Undated739781.9883721194.369243 m16
14House with large stones, Undated740174.46683721386.072236 m16
15Houses, Pottery, C19–20740317.53833721235.50736 m16
16House, Undated739859.44723721088.402112 m16
17Marj Bisri Temple, Roman740104.19223720585.843within16
18Rock shelter, stone tool, Palaeolithic741331.293722059.421363 m16
19House, C19–20741034.16083722181.43673 m16
20House, C20740462.09463720819.852within16
21Stone arch, Undated740964.0333722919.57830 m16
22Cave, Use Recent740947.9483722943.83858 m16
23(Unrecorded)740524.13133719578.995227 m16
23Ceramics and pottery pieces, Undated740516.31863719581.882223 m16
24Pottery, C18–19, flint flake, Palaeolithic-Neolithic740726.94223719207.961250 m16
25(Unrecorded)740889.52233719104.135373 m16
26House, C19–20740635.58413718425.69833 m16
27Settlement, pottery, Recent, some older738621.2823720046.368178 m16
28Pottery, Undated738496.69333719660.999550 m16
30House, pottery, C19–20739891.62073719800.55875 m16
31Settlement, pottery, C2BC to C3–4739745.4173719766.078162 m16
32Rock blocks, glazed pottery, C19–20739771.69613719436.866205 m16
33Possible stone borer, possibly Palaeolithic739780.31423719298.351225 m16
34House, pottery, Undated740179.88493719209.681107 m16
35House, C20740374.41253719787.96627 m16
36Rock-cut tomb (looted), pottery, C3–4740310.64113720686.5787 m16
37Settlement, pottery, 2C BC739000.75533720348.677within16
38Settlement, pottery, 2C BC737780.92713719704.908125 m16
39House, C19–20737800.10413719242.958439 m16
40Village, glazed pottery, C19–20737278.92493719242.401197 m16
41Small stone fortress, pottery, Undated737275.49093719485.8593 m16
44Stone mill, Undated737256.89013719716.611within16
45House or checkpoint, Undated736884.56543719309.738within16
46Rock-cut tombs, Roman to C2–3736379.60553718985.93616 m16
47Village, pottery, Roman-Ottoman736388.70793719140.299within16
48House or checkpoint, Undated736060.82943718728.392316 m16
49House, Undated736028.36063718579.621459 m16
50Settlement, pottery? Roman736802.52953718053.0151024 m16
52(Unrecorded)737207.683719619.828within16
53Settlement, pottery, Roman735895.80343717777.9261253 m16
55Necropolis, pottery, Undated738142.97973720503.087within16
56Possible rock-cut tomb, Undated735888.56513718388.138695 m16
57Pottery, Undated735864.29363717379.4751646 m16
58(Unrecorded)736367.50433717065.0621913 m16
60House, Undated737455.83493719496.48111 m16
61Pottery, Undated733158.34673717335.0333500 m16
62House, C19–20734539.4093718373.6071782 m16
63(Unrecorded)739470.08523720357.287within16
64Settlement, Roman-Recent733669.4643718389.4032522 m16
66House, pottery? Roman732853.2043718745.6763150 m16
67(Unrecorded)735340.22643719284.107627 m16
68Settlement, flints? Neolithic734509.02423718878.4351551 m16
69Settlement, pottery, Recent734630.17963718671.7661550 m16
71Settlement, pottery, Undated731009.04413717431.1035335 m16
74Pottery, glass, Roman734250.98573718773.4921829 m16
75Settlement pottery, glass, coin, Roman735741.11823719349.428249 m16
76(Unrecorded)735926.54543719363.22983 m16

The creation of the archaeological susceptibility map necessitates various geospatial data layers. To achieve this, we utilized the ALOS Prism 30 (AW3D30) DSM, which offers a horizontal resolution of approximately 30 meters. This data is derived from images captured by the Panchromatic Remote-sensing Instrument for Stereo Mapping (PRISM) on the Advanced Land Observing Satellite (ALOS) between 2006 and 2011. From this DEM, we extracted and classified the following geomorphometric parameters: slope, aspect, elevation, and difference from mean elevation (DIFF). The drainage network was created from the Digital Elevation Model using flow accumulation analysis with a stream threshold based on the mean accumulation value (Tang 2000). The soil map was digitized from the 1:50,000 soil map of Lebanon (Darwish et al. 2006) and georeferenced using the provided control points. Subsequently, it was converted from stereographic coordinates to UTM 36 N.

3.2 Model variables

Six geo-environmental variables were selected based on theoretical and empirical studies to be implemented in the process of generating the archaeological predictive models (Balla et al. 2014; Danese et al. 2014; Graves 2011; Heilen et al. 2013; Mink et al. 2009; Nsanziyera et al. 2018). The topographic variables cost distance to streams, slope, difference from mean elevation (DIFF), aspect, and elevation, were all created from the ALOS Prism 30 (AW3D30) (JAXA, n.d.). Proximity to water sources is one of the most crucial factors in establishing settlements. To assess this, a cost surface for traversing to streams was created using the Tobler hiking function (Tobler 1993) and the Path Distance Spatial Analyst tool, following the workflow by Tripcevich (2009). The author uses the following formula: 0.00016666 * exp (3.5 * abs(s + 0.05)) where s is the mathematical slope to convert speed to time and compute the vertical factor cost. A vertical factor table is then used as input in the ArcGIS Pro Path distance tool to generate an accumulative cost surface based on the vertical relative moving angle (VRMA), which is the angle of slope from the From cell to the To cell. According to this model, the minimum time required to traverse one meter occurs at a downslope gradient of 0.049957916 (–2.86 degrees). This approach measures the cumulative time cost of traversing each raster cell in the study area to reach the nearest available water source. The cost distance-to-streams map was reclassified into 10 different decimal hours’ classes using the equal interval method (Figure 2A). Slope is also a commonly used factor in establishing APM given that settlements are mostly established on areas with gentle slopes. The slope was rescaled logarithmically using the Rescale by Function tool in ArcGIS Pro Spatial Analyst. It was then reclassified into 10 different classes using the Natural Breaks (Jenks) method (Figure 2B). Difference from mean elevation (DIFF) was also implemented in this process. This index measures the difference in elevation between each cell in the DEM and the mean elevation within a specified neighbourhood around that cell. Cells with positive values indicate areas higher than their average neighbourhood such as ridges, while negative values indicate lower areas than the average neighbourhood such as valley floors (Wilson and Gallant 2000; De Reu et al. 2011). The DIFF was computed using the workflow determined by De Reu et al. (2011) using a circular neighbourhood of a 300 m radius. Six distinct morphological classes were identified and classified according to the methodology adapted from Weiss (2001), which is based on the standard deviation of the DIFF values: 1- Valley, 2- Lower slope, 3- Flat slope, 4- Middle slope, 5- Upper slope, and 6- Ridge. This classification followed the workflow outlined by De Reu et al. (2011; Figure 2C). Aspect is also considered as a very influential variable considering that sun exposure and wind direction are crucial factors determining site location (Figure 2D).

jcaa-8-1-150-g2.png
Figure 2

The geo-environmental variables influencing the location of archaeological sites.

Elevation or altitude above sea level was also integrated into the computation process and reclassified using equal intervals (Figure 2E). Soil types are also deemed relevant to settlement choice and thus were included in the modelling process. The soil map digitized from 1:50,000 maps shows 13 different types or combinations of soil types within the study area (Figure 2F; Darwish et al. 2006).

3.3 Investigating Archaeological site Distributions in Relation to geo-Environmental Categories

The initial step involves evaluating the spatial distributions of archaeological sites in terms of their relation to selected geo-environmental variables. The purpose of this is to determine whether there are notable statistical differences between the spatial distributions of archaeological sites observed and the background environment. In other terms, we seek to examine, for instance, if sites tend to be closer to water sources or on gentler slopes than expected by chance (Kvamme 1990).

In this assessment, we use the Kolmogorov-Smirnov goodness-of-fit test, originally developed by Kvamme (1990) for archaeological contexts, and subsequently applied in various case studies (Nsanziyera et al. 2018; Wheatley 1995). A comparison is made between the expected surface area of archaeological sites, treated as a statistical population, which is calculated based on the percentage of surface area for each class within each geo-environmental variable on one hand, and on the other hand, the observed surface area occupied by archaeological sites within each class, treated as the sample (Kvamme 1990). For this purpose, the cumulative distribution of the sample and the statistical population is plotted, and their difference (D) is calculated. A critical value (d) is then compared to the maximum value of D (Dmax). The critical value d of a two-tailed test using a significance level of 0.05 is calculated using the equation 1.36/√n (Thomas 1986).

Two hypotheses can be formulated in this context:

H0: The distribution of sites is independent of geo-environmental factors.

H1: Sites are distributed based on geo-environmental variables.

The H0 hypothesis is rejected if Dmax exceeds the critical value d, indicating a statistically significant difference between the sample and population distributions. In Table 2, a statistically significant difference is observed for all geo-environmental variables where Dmax exceeds the critical value d of 0.022. This implies that archaeological sites within the analysis area are likely located in relation to geo-environmental variables. However, as Wheatley (1995) states, the result of this test should be interpreted with caution, as it reflects association rather than causation, and the lack of repeated trials may affect the reliability of the findings.

Table 2

Distribution of Archaeological Sites Across Environmental Variables with Kolmogorov-Smirnov goodness-of-fit test Dmax Values.

VARIABLESCLASSAREA IN PIXELSAREA %EXPECTED SURFACE OF SITES IN PIXELSCUMULATIVE FREQUENCYSITES SURFACE IN PIXELS% OF SITESCUMULATIVE FREQUENCYD
Cost Distance to Streams (expressed in decimal hours)0.001–0.0322096136.53%3770.36554652.91%0.5290.16
0.033–0.71238421.58%2230.58127526.65%0.7960.21
0.071–0.11803014.00%1440.721858.24%0.8780.16
0.111–0.1556439.84%1020.820525.04%0.9280.11
0.151–0.19241617.25%750.892353.39%0.9620.07
0.193–0.23626694.65%480.939323.10%0.9930.05
0.237–0.28319493.40%350.97370.68%1.0000.03
0.284–0.33412772.23%230.99500.00%1.0000.01
0.335–0.3892930.51%51.00000.00%1.0000.00
0.39–0.53880.01%01.00000.00%1.0000.00
n =57375100%10321032100%Dmax = 0.21
DIFF (landforms)Valley739812.89%1330.12913713.28%0.1330.004
Lower slope943216.44%1700.29318718.12%0.3140.021
Flat slope18343.20%330.325666.40%0.3780.053
Middle slope2295240.00%4130.72537936.72%0.7450.020
Upper slope750613.08%1350.85615214.73%0.8920.036
Ridge825314.38%1481.00011110.76%1.0000.000
n =57375100%10321032100%Dmax = 0.053
Slope (Logarithmic)1.001-1310.05%10.00120.19%0.0020.001
1.001–6.22410611.85%190.019282.71%0.0290.010
6.225–7.10620353.55%370.055767.36%0.1030.048
7.107–7.70628444.96%510.104747.17%0.1740.070
7.707–8.12949698.66%890.19112412.02%0.2950.104
8.13–8.482812014.15%1460.33217416.86%0.4630.131
8.483–8.7651011817.63%1820.50919518.90%0.6520.144
8.766–9.0471164220.29%2090.71120019.38%0.8460.135
9.048–9.3291071318.67%1930.89811010.66%0.9530.054
9.33–10584210.18%1051.000494.75%1.0000.000
n =57375100%10321032100%Dmax = 0.144
AspectFlat310.05%10.00120.19%0.0020.00
North880915.35%1580.15416315.79%0.1600.006
Northeast51178.92%920.24312111.72%0.2770.03
East621510.83%1120.352878.43%0.3610.01
Southeast652211.37%1170.46519919.28%0.5540.09
South747113.02%1340.59513412.98%0.6840.09
Southwest47308.24%850.678484.65%0.7310.05
West883415.40%1590.83214814.34%0.8740.04
North West964616.81%1731.00013012.60%1.0000.00
n =57375100%10321032100%Dmax = 0.09
Elevation139–26513712.39%250.02400.00%0.0000.02
266–39224104.20%430.066181.74%0.0170.048
393–5191492426.01%2680.32662160.17%0.6190.29
520–6461203520.98%2170.53633632.56%0.9450.41
647–7741042018.16%1870.717575.52%1.0000.28
775–901964216.81%1730.88600.00%1.0000.11
902–102842297.37%760.95900.00%1.0000.04
1029–115511462.00%210.97900.00%1.0000.02
1156–12828891.55%160.99500.00%1.0000.01
1283–14093090.54%61.00000.00%1.0000.00
n =57375100%10321032100%Dmax = 0.41
SoilAreno-Eutric Leptosols1181420.58%2120.20613713.28%0.1330.073
Calcaro_Hortic Anthrosols49558.63%890.292242.33%0.1560.136
Eutric Arenosols1663728.98%2990.58240138.86%0.5450.037
Eutric Gleysols34025.93%610.641141.36%0.5580.083
Calcaric Regosols920.16%20.64300.00%0.5580.085
Haplic Cambisols860.15%20.64400.00%0.5580.086
Humi-Eutric Cambisols55559.68%1000.74125224.42%0.8020.061
Lithic Leptosols807314.06%1450.88216015.50%0.9570.076
Rendzic Leptosols24494.27%440.92470.68%0.9640.040
Eutric Regosols11071.93%200.94430.29%0.9670.023
Hypoluvic Arenosols12762.22%230.966181.74%0.9840.019
Haplic Luvisols and Leptic Luvisols12042.10%220.987111.07%0.9950.008
Haplic Luvisols7601.32%141.00050.48%1.0000.000
n =5741020.58%10323685100%Dmax = 0.136

3.4 Methods

3.4.1 Frequency ratio model

The first archaeological predictive model in this study was produced using the frequency ratio model (FR). This method is commonly implemented in Landslide Susceptibility Assessment (Akgun, Dag & Bulut 2008; Mezughi et al. 2011; Silalahi, Arifianti & Hidayat 2019). It is based on the existing correlation between the occurrence of archaeological sites and each of the related geo-environmental variables. Each class within every variable is assigned a grade based on the ratio of the observed archaeological site surface to the area or spatial extent of this class (Abou Diwan 2020; Aubry, Luís & Dimuccio 2012; Nicu, Mihu-Pintilie & Williamson 2019). A ratio greater than one indicates a high correlation between a certain class and the occurrence of archaeological sites, whereas a ratio less than one indicates that this class has less influence on archaeological sites occurrence. The frequency ratio method was implemented using GIS techniques. The classes of each geo-environmental variable raster map were assigned the value of the matching frequency ratio using the Spatial Analyst Lookup Tool (Table 3) based on the following equations:

Table 3

Frequency ratio values of the conditioning geo-environmental variables.

VARIABLECLASSNUMBER OF PIXELS IN CLASSCLASS % (A)NUMBER OF SITE PIXELS WITHIN EACH CLASSSITES % (B)FREQUENCY RATIO (B/A)
Cost Distance to Streams (expressed decimal in hours)0.001–0.0322096136.53%54652.91%1.45
0.033–0.71238421.58%27526.65%1.23
0.071–0.11803014.00%858.24%0.59
0.111–0.1556439.84%525.04%0.51
0.151–0.19241617.25%353.39%0.47
0.193–0.23626694.65%323.10%0.67
0.237–0.28319493.40%70.68%0.20
0.284–0.33412772.23%00.00%0.00
0.335–0.3892930.51%00.00%0.00
0.39–0.53880.01%00.00%0.00
DIFF (landforms)Valley739812.89%13713.28%1.03
Lower slope943216.44%18718.12%1.10
Flat slope18343.20%666.40%2.00
Middle slope2295240.00%37936.72%0.92
Upper slope750613.08%15214.73%1.13
Ridge825314.38%11110.76%0.75
Slope (rescaled logarithmic)1.001-1310.05%20.19%3.59
1.001–6.22410611.85%282.71%1.47
6.225–7.10620353.55%767.36%2.08
7.107–7.70628444.96%747.17%1.45
7.707–8.12949698.66%12412.02%1.39
8.13–8.482812014.15%17416.86%1.19
8.483–8.7651011817.63%19518.90%1.07
8.766–9.0471164220.29%20019.38%0.96
9.048–9.3291071318.67%11010.66%0.57
9.33–10584210.18%494.75%0.47
AspectFlat310.05%20.19%3.59
North880915.35%16315.79%1.03
Northeast51178.92%12111.72%1.31
East621510.83%878.43%0.78
Southeast652211.37%19919.28%1.70
South747113.02%13412.98%1.00
Southwest47308.24%484.65%0.56
West883415.40%14814.34%0.93
North West964616.81%13012.60%0.75
Elevation139–26513712.39%00.00%0.00
266–39224104.20%181.74%0.42
393–5191492426.01%62160.17%2.31
520–6461203520.98%33632.56%1.55
647–7741042018.16%575.52%0.30
775–901964216.81%00.00%0.00
902–102842297.37%00.00%0.00
1029–115511462.00%00.00%0.00
1156–12828891.55%00.00%0.00
1283–14093090.54%00.00%0.00
SoilAreno-Eutric Leptosols1181420.58%13713.28%0.65
Calcaro_Hortic Anthrosols49558.63%242.33%0.27
Eutric Arenosols1663728.98%40138.86%1.34
Eutric Gleysols34025.93%141.36%0.23
Calcaric Regosols920.16%00.00%0.00
Haplic Cambisols860.15%00.00%0.00
Humi-Eutric Cambisols55559.68%25224.42%2.52
Lithic Leptosols807314.06%16015.50%1.10
Rendzic Leptosols24494.27%70.68%0.16
Eutric Regosols11071.93%30.29%0.15
Hypoluvic Arenosols12762.22%181.74%0.78
Haplic Luvisols and Leptic Luvisols12042.10%111.07%0.51
Haplic Luvisols7601.32%50.48%0.37

where FRij is the Frequency Ratio of class i of variable j; bij is the percentage of the surface of archaeological sites of class i of variable j; and aij is the percentage of the surface area of class i of variable j.

The APM based on the FR model was computed using the Raster Calculator Spatial Analyst Tool in ArcGIS. The FR values of variables influencing the occurrence of archaeological sites were added together using the following equations:

1
FRij=bijaij
2
APM=j=1nFRij
3
APM=CostDistancetoStreamsFR+SoilFR+SlopeFR+DIFF(landforms)FR+ElevationFR+AspectFR

where FRij is the weight of class i of parameter j, and n is the number of conditioning variables (Abou Diwan 2020; Lee & Talib 2005).

3.4.2 Shannon’s entropy

Entropy is a measure of disorder or energy in a system, originally developed in the field of thermodynamics and statistical physics initially formulated by Clausius (1865) and later developed by Boltzmann (1877). In simple terms, higher entropy indicates less order and interrelation among system elements, resulting in a greater amount of unused or inconsistently used energy (Yufeng & Fengxiang 2009). In Information Theory, Shannon’s Entropy, developed by C. Shannon, represents the average amount of information carried by a message from an information source (Shannon 1948). The concept of entropy has been widely used in assessing natural hazard risks, such as floods and landslides, providing an objective weight index for the factors contributing to these events (Haghizadeh et al. 2017; Bednarik et al. 2010; Constantin et al. 2011; Nohani et al. 2019; Pourghasemi, Mohammady & Pradhan 2012; Yang & Qiao 2010). It helps identify the environmental factors with greater influence on the occurrence of natural hazards (Bednarik et al. 2010; Roodposhti et al. 2016).

Although Shannon’s Entropy (SE) has been extensively used in geosciences, its application in archaeological predictive modelling has been limited. In the context of archaeological site occurrence, entropy refers to the degree to which geo-environmental variables affect the development of settlements. The calculation of a weight value for each variable is based on the principle of bivariate analysis, where the density of archaeological sites within each variable is determined drawing on a procedure described by Vlcko, Wagner& Rychlikova (1980) and used by Bednarik et al. (2010) and Constantin et al. (2011) using the following equations:

4
Pij=bijaij
5
(Pij)=Pijj=1SjPij
6
(Hj)=i=1Sj(Pij)log2(Pij),j=1,,n
7
Hjmax=log2Sj,Sj=number  ofclasses
8
Ij=HJmaxHJHJmax,I=(0,1),j=1,
9
Wj=IJPij

where bij is the percentage of occurrence of archaeological sites in terms of surface for class i of variable j, and aij is the percentage of the surface area of class i of variable j, (Pij) is the normalized conditional probability of Pij which is obtained by dividing it by the sum of all conditional probabilities of each variable j. Hj is the Shannon entropy value, representing the average amount of surprise or uncertainty associated with each variable. It is the negative sum of the product of (Pij) and the base-2 logarithm of (Pij). A low entropy value close to 0 in Shannon entropy indicates a high level of predictability and a low degree of uncertainty in the random variable. Hjmax represent maximum entropy values of each variable, Ij is the information coefficient for each variable and is normalized between 0 and 1. A value closer to 1 indicates a higher predictive capacity of the variable. Wj represents the weighted information coefficient for each variable (Table 4). The Archaeological predictive model was then calculated using the Raster Calculator Spatial Analyst Tool in ArcGIS based on the following equations:

Table 4

Spatial relationship between sites’ causative variables and archaeological sites’ locations using Shannon’s entropy.

VARIABLECLASSPij(Pij)HjHj maxMEAN OF PijIjWj
Cost Distance to Streams (expressed decimal in hours)0.001–0.0321.450.282.023.320.510.390.20
0.033–0.71.230.24
0.071–0.110.590.11
0.111–0.150.510.10
0.151–0.1920.470.09
0.193–0.2360.670.13
0.237–0.2830.200.04
0.284–0.3340.000.00
0.335–0.3890.000.00
0.39–0.5380.000.00
DIFF (landforms)Valley1.030.132.382.581.150.070.08
Lower slope1.100.14
Flat slope2.000.26
Middle slope0.920.12
Upper slope1.130.14
Ridge0.750.10
Slope (rescaled logarithmic)1.001-13.590.252.753.321.420.170.24
1.001–6.2241.470.10
6.225–7.1062.080.15
7.107–7.7061.450.10
7.707–8.1291.390.10
8.13–8.4821.190.08
8.483–8.7651.070.08
8.766–9.0470.960.07
9.048–9.3290.570.04
9.33–100.470.03
AspectFlat3.590.312.913.171.290.080.10
North1.030.09
Northeast1.310.11
East0.780.07
Southeast1.700.15
South1.000.09
Southwest0.560.05
West0.930.08
North West0.750.06
Elevation139–2650.000.001.603.320.460.520.24
266–3920.420.09
393–5192.310.50
520–6461.550.34
647–7740.300.07
775–9010.000.00
902–10280.000.00
1029–11550.000.00
1156–12820.000.00
1283–14090.000.00
SoilAreno-Eutric Leptosols0.650.062.423.700.620.350.21
Calcaro-Hortic Anthrosols0.270.02
Eutric Arenosols1.340.12
Eutric Gleysols0.230.02
Calcaric Regosols0.000.00
Haplic Cambisols0.000.00
Humi-Eutric Cambisols2.520.22
Lithic Leptosols1.100.10
Rendzic Leptosols0.160.01
Eutric Regosols0.150.01
Hypoluvic Arenosols0.780.07
Haplic Luvisols and Leptic Luvisols0.510.04
Haplic Luvisols0.370.03
10
APM=i=1n(WJPJ)
11
APM=(0.20*CostDistancetoStreamsPj)+(0.08*DIFFlandformsPj)+(0.24*SlopePj)+(0.10*AspectPj)+(0.24*ElevationPj)+(0.21*SoilPj)

4 Results and Discussion

The generated maps were divided into five classes using natural breaks of Jenks (very low, low, moderate, high, and very high), which illustrate the archaeological potential of the investigated area (see Figures 3 and 4). As mentioned earlier, the Frequency Ratio (FR) value is correlated with the occurrence of archaeological sites, with a higher value indicating a stronger correlation. The analysis of the cost distance to streams variable reveals that areas located within a minute and 55.2 seconds walking distance (0.032 decimal hours) have the highest FR value of 1.45. In terms of soil, areas with Humi-Eutric Cambisols have the highest FR value of 2.52. The landform variable indicates higher FR values in flat slopes (2.00). Regarding elevation, areas ranging from 393 to 519 meters hold the highest FR value of 2.31. Similarly, areas with a flat and south-eastern aspect have a high FR value, respectively 3.59 and 1.70.

jcaa-8-1-150-g3.png
Figure 3

Probability map based on the frequency ratio method.

jcaa-8-1-150-g4.png
Figure 4

Probability map based on Shannon’s entropy.

The results for the Shannon’s entropy (SE) model indicate that the most influential variables associated with the occurrence of archaeological sites, based on the weighting process, are elevation and slope (0.24), followed by the soil (0.21).

Furthermore, 24.32% of the archaeological sites according to this model are located in the area with the highest potential. The latter covers 8.73% of the total surface. As for the area located within the future dam reservoir, the statistics show that 63.30% of this area is characterized by a very high archaeological potential according to the FR model against 62.06% for the FR model (Figure 5).

jcaa-8-1-150-g5.png
Figure 5

The surface area percentage of each probability class within the dam reservoir.

The region most likely to contain archaeological sites covers an area of 4.27 km2 according to the FR model, compared to 4.16 km2 as indicated by the SE model. The SE model’s highest probability area overlaps with 99.36% of the FR model’s corresponding area while the FR model overlaps with 96.63% of the SE model. Overall, the global match between the two models is 96.04% of their total combined area. Overall, the SE method did not significantly enhance the predictive accuracy of the model, however, both models highlight the same region of high archaeological potential, providing validation for the predictive reliability of the selected approaches. The near-complete spatial match (96.04%) between the models reinforces the assumption that these methods effectively pinpoint areas of significant archaeological interest. This consistency ensures a valid foundation for the archaeological survey design.

4.1 Validation

To validate the predictive model, the simple Gain statistics after Kvamme (1988) were employed, and the results were expressed using the following formula:

12
G=1%PS/%GS

where G is the gain mode, PS is the percentage of the area characterized by the highest probability of hosting archaeological sites, and GS is the percentage of observed archaeological sites within this area. The gain values extend from 0 to 1. A value closer to 1 indicates higher accuracy in predicting archaeological sites. The calculated gain for the FR model is 0.64 indicating a good performance with 24.71% of the observed sites located in the area characterized by the highest potential and covering 8.98% of the study area’s total surface. The SE model also shows a similar performance with a gain value of 0.64. Since Kvamme’s Gain is typically calculated using the same dataset that was used to build and train the model primarily it is therefore considered as an internal validation metric susceptible to overfitting and does not inherently test the model’s ability to generalize to new, unseen data.

External validation is therefore essential to ensure that the model can generalize effectively to unseen data. To achieve this, k-Fold Cross-Validation was implemented, a technique particularly valuable when dealing with a small dataset, as it enables the model to be trained and tested multiple times on different data subsets, optimizing the use of available data (Kohavi 1995; Walker et al. 2023; Wang, Shi & Oguchi 2023). In this case, the dataset was divided into five equal-sized folds. The model was trained on four folds while being tested on the remaining fold, with this process repeated five times. The overall model performance was then averaged across the five test folds. By using k-Fold Cross-Validation, a robust estimate of the model’s performance is obtained, as each site is used for both training and testing in different iterations. In the present case, the 67 archaeological sites were divided into approximately five equal folds by assigning random values to the attribute table in ArcGIS Pro, which were then used to allocate each site to a particular fold. Five APMs were then produced using four folds and tested using the remaining fold and the Kvamme’s Gain for each fold of cross validation was calculated for both methods. The average Kvamme’s Gain is than calculated and compared to Kvamme’s Gain calculated using all sites. If the two value are similar, the model generalizes well. The results in Table 5 indicate that average Kvamme’s Gain values for both methods are not statistically significant as indicated by the results of the T-test were the p-values (0.202 and 0.414) for respectively the FR and Shannon methods are greater than 0.05.

Table 5

Comparison of Kvamme’s Gain values for two models (FR and SE) across five folds.

TRAINING DATA (FOLD NUMBER)TEST DATA (FOLD NUMBER)KVAMME’S GAIN VALUE
FOLD NO.NUMBER OF SITESFOLD NO.NUMBER OF SITESFRSE
1, 2, 3, 4545130.61828420.6191508
1, 2, 3, 5524150.70707490.6094261
1, 2, 4, 5523150.5892920.5199014
1, 3, 4, 5532140.44486450.486424
2, 3, 4, 5571100.47132960.753475
Mean0.5661690.5976755
T – Statistics–1.526–0.910
P-value0.2020.414

A Monte Carlo simulation using random site distributions was also used to validate Kvamme’s Gain. The aim was to assess how well the Archaeological Predictive Model performs compared to random chance by simulating 100 random distributions of archaeological sites across the study area, matching the actual number of sites in the dataset. The objective was to determine whether the Kvamme’s Gain statistics observed for both APMs are significantly higher than what would be expected from random site distributions. For each iteration of the Monte Carlo simulation, Kvamme’s Gain was calculated. The “observed” Kvamme’s Gain from both APMs was then compared to the distribution of Kvamme’s Gain values generated from the random simulations. As illustrated in Figures 6 and 7, the observed Kvamme’s Gain for both APMs is much higher than the distribution of the Monte Carlo simulated gains, which are mostly clustered around 0. This confirms that the APM models are valid and perform significantly better than random chance.

jcaa-8-1-150-g6.png
Figure 6

Comparison of Observed Kvamme’s Gain using FR with Monte Carlo Simulated Gains.

jcaa-8-1-150-g7.png
Figure 7

Comparison of Observed Kvamme’s Gain using SE with Monte Carlo Simulated Gains.

5 Conclusion

The use of geospatial technologies in cultural resource management has increased significantly in recent years. In Lebanon, however, quantitative techniques have not been widely applied to protect endangered archaeological sites. In this study, an attempt was made to generate two inductive predictive models with the primary goal of optimizing the planning and execution of archaeological surveys prior to the construction of the Dam. Given the constraints of time and budget, the purpose of generating these models was to provide data-driven insights into how the archaeological surveys should be structured, which target areas to prioritize allowing for more targeted efforts in the identification of potential heritage assets. The decision to use inductive techniques was driven by the quality and limitations of the available archaeological data, particularly the insufficient sample size and the lack of comprehensive information regarding the chronological distribution or precise dating of the sites. These constraints prevented the use of a deductive approach, which would have required more robust, detailed data to formulate and test specific hypotheses. These inductive-based models were created using two probabilistic methods based on established geostatistical relationships between archaeological data and a series of geo-environmental variables. Geospatial techniques were used to construct these models within a GIS environment. As indicated by the Kvamme’s Gain value, the susceptibility maps generated through this method proved to be reliable for predicting archaeologically significant areas. Although Shannon’s entropy method (SE) did not prove more efficient than the frequency ratio method (FR), both techniques identified the same area as having the highest archaeological potential. This research offered a conceptual framework for implementing an archaeological field investigation plan and devising essential mitigation strategies before the initiating the Dam’s project as the construction of this infrastructure will inevitably lead to the loss of archaeological sites and past anthropic activities, potentially rendering the project non-compliant with international conventions (UNESCO 1972). This study highlighted the necessity for a systematic pedestrian survey of the land surface that will be inundated by the future dam, due to its exceptionally high archaeological potential. The use of remote sensing techniques using Lidar and multispectral imagery could be used as a complementary tool for guiding the walkover survey. These techniques could help detecting visible and subsurface archaeological features ensuring that the full archaeological potential of this area is thoroughly explored.

Data Accessibility Statement

The data used for this article can be found online at https://doi.org/10.5281/zenodo.14618433.

Competing Interests

The author has no competing interests to declare.

DOI: https://doi.org/10.5334/jcaa.150 | Journal eISSN: 2514-8362
Language: English
Submitted on: Feb 2, 2024
Accepted on: Jan 12, 2025
Published on: Feb 5, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Georges Abou Diwan, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.