(1) Overview
Background
Prehistoric human populations in Sulawesi utilized caves and rock shelters within the region’s widespread karst landscapes. These areas are hypothesized to be a crucial factor in the human migration process, as karst ecosystems provided diverse natural resources essential for early human life in the region [3, 4, 6, 8, 9]. Therefore, this area is seen as a potential migration route in the early history of humans in the Wallacea region from the late Pleistocene to early Holocene [1, 7].
Evidence of these early settlements is primarily concentrated in the Maros-Pangkep area and the Morowali area [2, 3, 7, 9]. The majority of documented archaeological sites on Sulawesi Island are located within the karst of the southern part of Sulawesi, with more than 734 prehistoric sites recorded in the Maros-Pangkep karst area (BPK XIX, 2023). Recent archaeological findings indicate that human occupation in Sulawesi dates back over 40,000 years, as evidenced by the dating of prehistoric rock art [6, 10]. Notable discoveries include cave paintings in the Maros-Pangkep Karst region, estimated to be between 39,900 and 51,200 years old, as well as similar findings in the Morowali area, dated to approximately 42,000 years ago [1, 4, 5].
The southeastern arm of Sulawesi is considered to have significant archaeological potential. This region encompasses several prominent karst zones, including the Matarombeo, Mekongga, Tangkelemboke, and Sombori areas [11, 12]. These zones remain largely unexplored archaeologically. Therefore, systematic exploration is needed to document potential archaeological sites in the area.
The limited archaeological data in the southeastern arm of Sulawesi constitutes a major challenge in understanding the distribution of cultural sites in the region. However, with advances in predictive modeling technology, there is an opportunity to apply Archaeological Predictive Modeling (APM), specifically using the MaxEnt approach. Predictive models are often used to reconstruct the spread of humans in the Prehistoric Era [13, 14]. This model allows for spatial analysis based on environmental factors that influence the distribution of archaeological sites. Thus, APM can provide a more accurate estimation of areas with archaeological potential. In the context of prehistory in Indonesian karst areas, similar research has been conducted in the Kapuas Basin, the Gunung Kidul area, and the Enrekang karst area [15, 16, 17].
The MaxEnt method is considered appropriate for application in areas with limited archaeological data. This approach allows the construction of probabilistic distribution models based on known site presence data and relevant environmental variables. The resulting model is then used as a guide in conducting verification surveys in the field. MaxEnt opens up opportunities for systematic prehistoric exploration in the Wallacea region.
Spatial Coverage
This study focuses on the southeastern arm of Sulawesi, a region regarded as having high archaeological potential. Research activities are distributed across several key locations, including the Tenggera Hills, Morowali Islands, Routa, and Lelewawo areas. The study area spans three provinces: South Sulawesi (East Luwu Regency), Central Sulawesi (Morowali Regency), and Southeast Sulawesi (including North Kolaka, Kolaka, East Kolaka, Konawe, and North Konawe Regencies).
Northern boundary: –2.3732
Southern boundary: –3.5026
Eastern boundary: 122.5006
Western boundary: 120.7477
Temporal coverage
The time context in this study ranges from the Late Pleistocene to the Paleometallic period or 19,000 – 1,000 BP.
(2) Methods
The southeastern arm of Sulawesi is characterized by an extensive karst landscape. Despite its high potential, archaeological data from this region remain scarce. To address this gap, this study introduces a predictive approach to estimate the likelihood of undiscovered archaeological sites. Among the various predictive models used in previous studies, the MaxEnt model has proven to be one of the most effective. In this research, site distribution data are integrated with environmental variables processed using QGIS version 3.42.0.
This research utilized archaeological data collected from previous studies in the study area. The data show a distribution pattern that tends to cluster in certain areas, resulting in empty spaces between the groups. A total of 64 sites were used as research objects, and these are scattered in several regional groups. The source of the archaeological data comes from research conducted by the Makassar Archaeological Center, Sue O’Connor, the Manado Archaeological Center, and several students from Halu Oleo University [18].
The environmental data used as input in the MaxEnt analysis consisted of six main environmental variables. These data were obtained from various sources and processed to support the modeling of archaeological site distribution. One of the main sources is the National Digital Elevation Model (DEMNAS) published by the Geospatial Information Agency (BIG). DEMNAS is a combination of several data sources, namely IFSAR with a spatial resolution of 5 meters, Terrasar-X with a spatial resolution of 5–10 meters, and ALOS PALSAR with a spatial resolution of 11.2 meters. DEMNAS has a spatial resolution of 0.27 arcsecond and uses the EGM2008 vertical datum. This data was processed using QGIS software (version 3.40.0) to produce three environmental variables, which are elevation, slope, and aspect.
Furthermore, hydrological data were obtained in the form of shapefiles published by the Geospatial Information Agency (BIG). This data include hydrological information from the North Kolaka Regency area, such as the locations of lakes and streams. These hydrological variables were used to analyze the influence of water resources on the selection of archaeological site locations.
Another data source is the geological map published by Geomap, Ministry of Energy and Mineral Resources (ESDM) of the Republic of Indonesia. This map was used to determine the geological characteristics of the study area and identify geological variables relevant to the location of archaeological sites.
All environmental data were originally vector and raster data. Both types of data were then converted into raster data in ASCII format. All processed environmental data were integrated into MaxEnt software to generate archaeological site distribution modeling based on the available environmental variables. This process was carried out to identify environmental factors that contribute to the presence of sites, as well as to provide insight into the potential distribution of other sites in the study area. The output of the MaxEnt program is an indicative map.
MaxEnt was configured with default regularization settings and used a logistic output format. We employed a 10-fold cross-validation technique to assess model performance. The resulting model was evaluated using Receiver Operating Characteristic (ROC) curves, and the Area Under Curve (AUC) score was 0.955, indicating excellent discriminative ability. The score reflects the capacity to distinguish between suitable and unsuitable habitat conditions.
The indicative map is a map that shows potential areas where archaeological data is predicted to be present. This map can be used to identify areas that may contain archaeological resources. In addition, it can be used as a reference in planning field research. On the indicative map, potential is classified based on the level of probability of the presence of archaeological data. The classification is typically represented by color, with the probability number calculated through modeling. The higher the number, the greater the potential for the existence of archaeological sites in an area.
Sampling strategy
Field verification of the MaxEnt predictive model was conducted in the Tolala and Porehu Districts of North Kolaka Regency, which were selected because they represent all five probability classes generated by the MaxEnt analysis. The research employed an exploratory pedestrian survey, focusing on the outer margins of karst formations and terrain that was logistically accessible. The sampling strategy was designed as a predictive-guided sampling approach, in which survey areas were selected based on the probability values shown on the MaxEnt map. This approach ensured that field verification covered the full spectrum of predicted potential—from very low to very high—to evaluate the consistency of the model’s spatial predictions.
The survey was carried out over a nine-day period in December 2024. Field teams navigated to target locations using handheld GPS units and conducted walkover inspections, documenting archaeological observations as well as environmental characteristics. The walkover approach did not follow transect or grid patterns; instead, movement in the field was directed by priority zones indicated by the predictive map and by geomorphological indicators such as the presence of rock shelters, exposed stratigraphy, karst terraces, and proximity to water sources.
Environmental conditions significantly influenced the sampling approach. Dense vegetation within the interior karst labyrinth and equipment limitations led the survey to focus primarily on the external boundaries of the karst formations. As a result, most surveyed areas fell within low- to moderate potential zones, whereas only a small portion represented high to very high potential classes (See Figure 1). This situation contributed to a distribution of findings that was largely concentrated in low to moderate probability zones. The field verification recorded one site within a very high-probability zone, six sites in moderate potential areas, five sites in low probability zones, and three sites in areas classified as very low probability.

Figure 1
Survey tracks and new sites on a predictive map classified by archaeological site probability.
Quality Control
We re-examined all data related to the distribution of archaeological sites within the spatial coverage. This was done to ensure that there was no bias or duplication in the data that would be input to the Maxent application. MaxEnt is used as a predictive modeling tool because of its ability to handle limited datasets and produce distribution estimates based on the principle of maximum entropy. The model works by comparing archaeological site location data with environmental variables to identify distribution patterns that can be used to predict the presence of new sites.
All spatial datasets both raster and vector were standardized to the WGS84 coordinate system to guarantee cross-platform compatibility for analysis. GPS data collected during field surveys using handheld devices demonstrated a positional accuracy of approximately 3–5 meters. All coordinate data from field surveys and environmental sources were aligned and verified for projection consistency prior to spatial modeling.
Constraints
Access to deep karst interiors was limited due to rugged terrain and logistical difficulties, confining surveys primarily to the more accessible outer zones. Dense vegetation cover, steep slopes, and heavy rainfall during the December 2024 fieldwork further hampered visibility and survey reach. In addition, some archaeological features particularly burial caves, may have been disturbed or looted in the past, which could affect the accuracy and contextual integrity of the findings. From a modeling perspective, reliance on presence-only data inherently limits the ability to distinguish true absence zones, potentially skewing distribution predictions in areas with sparse data. Moreover, the hydrological datasets used in this study had spatial gaps and inconsistencies, requiring manual edits and assumptions prior to integration into the modeling process.
(3) Dataset description
Object name
Data related to archaeological site input and survey results in the form of CSV files and Maxent processed data, namely Archaeological_Site (.html), and raw data from Maxent, namely Archaeological_Site (.ASC).
Data type
Primary data, secondary data, processed data, interpretation of data
Format names and versions
.asc, .html, .png, .jpeg, .rar, .xlxs.
Creation dates
01/02/2023–06/18/2025
Language
English, Indonesia
License
Creative Commons Attribution Share Alike 4.0 International
Repository location
Publication date
12/02/2025
(4) Reuse potential
This dataset provides opportunities for reuse in various disciplines such as archaeology, environmental studies, and spatial planning. One of its main uses lies in applying predictive modeling in other regions with comparable karst features to detect areas likely to contain archaeological remains. The structured subsets in the dataset enable comparative studies such as assessing site distribution patterns across different prehistoric timeframes, and facilitate validation efforts through field investigations in varied geographic contexts.
The predictive maps generated from this model serve as a valuable tool in identifying previously unrecorded archaeological sites. In addition, this dataset supports sustainable land use planning by identifying zones of archaeological significance that may overlap with current or future industrial activities. In regions where cultural heritage resources coexist with extractive industries, predictive modeling can help align preservation strategies with economic development goals. Such models offer evidence-based insights for crafting spatial policies and regulatory frameworks aimed at protecting archaeological assets.
The data provided in raster and shapefile formats are fully compatible with standard GIS platforms, increasing their usefulness for spatial analysis and long-term cultural heritage management. The dataset also has the potential to be integrated into web-based GIS applications, thus enabling data-driven decision-making.
Acknowledgements
The authors would like to thank the local volunteers who provided assistance and support during the exploration of the North Kolaka karst area. Their contributions included help with navigation, provided geographical overviews, and supported research logistics. In addition, we would like to express our deepest appreciation to all individuals and institutions involved, both directly and indirectly, in supporting the success of this field research. The support of various parties, including academic institutions, communities and government agencies has been instrumental in ensuring the continuity of this research, from the planning stage and data collection, to the final analysis. We also express our sincere gratitude to Dr. Martin Hinz for his willingness to review and help improve our draft.
We also thanks to the Indonesia National Research Inovation Agency (BRIN) and Wallacea Heritage Indonesia, and Archaeology Department, Hasanuddin University, Makassar.
Competing Interests
The authors have no competing interests to declare.
Author Contributions
Yadi Mulyadi: Conceptualization, project administration, data curation, methodology, inverstigation, formal analysis, writing-original draft, writing-review & editing.
Hermawan: Conceptualization, project administration, data curation, methodology, investigation, formal analysis, writing-original draft & editing.
Fahran Reza: Conceptualization, project administration, data curation, methodology, investigation, formal anlysis, writing-original draft & editing.
Muh Alif: Conceptualization, data curation, methodology, investigation, formal analysis, writing-original draft & editing.
Fakhri: Conceptualization, data curation, methodology, inverstigation, formal anlysis, Supervision, funding acquisition.
Hasanuddin: Conceptualization, data curation, methodology, inverstigation, formal anlysis, Supervision, funding acquisition.
Supriadi: Conceptualization, data curation, methodology, inverstigation, formal anlysis, Supervision, funding acquisition.
Darfin: Data curation, methodology, investigation, formal analysis.
Mega Ayu Alfitri: Data curation, methodology, investigation, formal analysis.
Enriko: Data curation, methodology, investigation, formal analysis.
Muhammad Agang: Data curation, methodology, investigation, formal analysis.
Muhammad Ilham Nur; Data curation, methodology, investigation, formal analysis.
Syamsul Bahri: Data curation, methodology, investigation, formal analysis.
Bernadeta Kuswarini Wardaninggar: Data curation, methodology, investigation, formal analysis.
