1. Introduction
Archaeological research in India has a complex and layered history that is interwoven with the nation’s political, social and economic past (Chakrabarti 1988; 2003). India’s archaeological research tradition has deep colonial roots. The Archaeological Survey of India (ASI) was established by an army officer of the colonial British government in 1861, Sir Alexander Cunningham. Likewise, Palaeolithic studies in the subcontinent began when Robert Bruce Foote, a British geologist, discovered a Palaeolithic stone tool in 1863 near Chennai, then known as Madras (Chakrabarti, 1979; Pappu 1991–92; Kennedy 2000). Excavations and explorations were largely dominated by the ASI and its members until 1947, after which this hegemony gave way to a more active role by colleges and universities in Indian archaeological studies. Even though the beginning of archaeology in the country was under British colonialism, there has been a strong tradition of archaeological research taken forward by Indian researchers in the post-colonial era (Misra and Nagar 1973; Sankalia 1974; Bhan 1997; Chakrabarti 2003). The immediate post-Independence era is also an important period for understanding and untangling how the many overlapping and competing facets of Indian society contributed to archaeological inquiry in the country, especially factors such as geographical region, caste, economic status and social position that often influence education and research (Kumar 1998; Chaudhary 2015).
The Lower Palaeolithic record of South Asia is extremely rich and has immense untapped potential (e.g. Mishra 2006; Misra 2001; Petraglia and Korisettar 1998; Pappu and Misra 2001). While the pre-Acheulean evidence reported from India and Pakistan remains contentious, the earliest (Large Flake) Acheulean occurrence is dated to around 1.5 mya from Attirampakkam, Tamil Nadu (Pappu et al. 2011) and the youngest extends to 85 kya from Singi Talav, Rajasthan (Blinkhorn et al. 2021). That record overlaps considerably with the oldest known South Asian Middle Palaeolithic at 385 kya, also from Attirampakkam (Akhilesh et al. 2018), and appears to follow a global trend of temporal overlap between the two Palaeolithic periods (Key et al. 2021). Large Flake Acheulean (Sharon 2010) artefacts include cores, polyhedrons, hand axes, cleavers, miscellaneous bifaces, scrapers, flakes, blades, choppers, notches, debitage and other elements (Misra 2001). In India, the term Acheulean (or Acheulian) was historically referred to as the Madrasian (Pappu and Akhilesh 2022) for many decades and, likewise, the transition from the use of African terminology (Stone Age) to European terminology (Palaeolithic) took place in India only in the 1960s, as evident in IAR volumes and other contemporary literature of that decade (see Chauhan, in press).
This paper uses selected data on the Indian Lower Palaeolithic compiled by the authors from issues of Indian Archaeology – A Review (IAR), an annual publication by the Archaeological Survey of India, detailing the year’s archaeological and cultural activities undertaken across the country (Figure 1). This includes research by members of ASI, various state-level archaeology departments, and university departments. This record is open access and free PDFs are available for most issues on ASI’s official website under the National Mission on Monuments and Antiquities (NMMA).1 We have also made our compilation of sites available via Zenodo.2

Figure 1
The geocoded Lower Palaeolithic sites plotted on a map of South Asia.
This paper has a three-fold but combined focus on methodological and interpretative issues that arise through GIS applications on archaeological data. It is a first attempt to utilise ESRI’s geocoding service to enhance an abundant but patchy dataset. We are using the distribution of Lower Palaeolithic sites as a proxy to explore the following:
How a simple GIS process such as geocoding can run into various obstacles when the dataset in question is not ideal (e.g. legacy data, dynamic region).
How preferential sampling in a region can be understood through an analysis of the social, political and economic realities of the time.
The use of GIS as a potentially powerful tool to level the playing field of archaeological research in countries in the Global South such as India.
The methodology followed in this article is outlined below:
Data compilation began from the first publication of IAR in 1953 and continued until 1976, spanning a total of 23 years and including the early years of India as an independent nation.3 The focus was on compiling all the recorded instances of Lower Palaeolithic sites in India that were present in the IARs available online. For this, the “Exploration” and “Excavation” sections of the reports were systematically and thoroughly scoured and relevant data noted down.
This gathered data was then compiled into a Microsoft Excel spreadsheet resulting in a total of 1535 Lower Palaeolithic sites or occurrences being recorded. The columns included the following headings: volume, year, site name, state, district, cultural period, artefact(s) reported, name(s) of investigator(s).
All this information was then imported into ArcGIS Pro and geocoded through ESRI’s Online World Geocoding Service.4 Geocoding is the process of converting a text-based description of a location to a position on the map (Goldberg et al. 2007). It is extremely beneficial as it recognises geographical patterns and analyses address data. Geocoding can be done by adding one place description at a time or by giving a table with multiple locations at once, a process known as inputting the data. The authors have used ESRI’s geocoding service (which is paid), but there are alternative open access and free services available as well, such as in QGIS. The software classifies the input into either absolute or relative values. Relative values cannot be geocoded and are ignored, while absolute values are turned into coordinates on the map.
Some of the compiled sites were geocoded successfully on the first run [58% (n = 874)], with 39% (n = 608) with tied results and 3% (n = 53) unmatched. Tied results were those sites that had multiple possible locations, while unmatched sites were those without any obvious matches.
The process of geocoding was followed by “rematching” which allows manual intervention to resolve the ties and unmatched sites. Additional information
can be used to aid the decision-making process of rematching sites. The process of rematching sites involved consulting additional sources on the named sites, such as articles, papers, mentions in books etc. We also used remote sensing (Google Earth Pro) to narrow down on possible locations using the names, descriptions, and approximate locations given in the IARs.
The geocoded sites were plotted on a map to visualise any emerging spatial patterns in their geographic distribution across India (Figure 1). Thematic maps have also been produced from this data to better understand the relationship between Lower Palaeolithic sites and the surrounding geographies, although all patterns and observations are likely to change once all or most known Lower Palaeolithic sites are mapped from all available IARs and other published and unpublished literature.
The mapping of these sites allows us the opportunity to gain insight on the spatial relationships between Lower Palaeolithic sites in India and leads to a better comprehension of the research history of the Palaeolithic in India’s early post-Independence period (Figure 2). Mapping of Lower Palaeolithic sites from the immediate post-Colonial era is also a vehicle for understanding various facets of India’s social, political and economic history. As these factors undoubtedly influenced not only who was able to participate in research but also the places that attracted the most interest, leading to possible preferential or biased sampling of the archaeological sites. The same observations and implications would be applicable to Middle and Upper Palaeolithic sites as well; all were compiled and mapped for a conference presentation (Chauhan et al. in progress) which is currently being prepared for publication.

Figure 2
Geocoded Lower Palaeolithic sites plotted on a map and divided according to reported context.
It is important to note that the resulting geocoded locations indicate only general locations of Palaeolithic investigations near the concerned villages; the precise site locations visited by the previous researchers would be as much as a few km away in many cases. Therefore, any emerging patterns must be cautiously interpreted as they are not solely indicative of actual spatial relationships in the Palaeolithic at both intra- and inter-site and regional levels. Moreover, the geocoded locations were originally determined by various factors such as preferred research focus, sampling bias and selective fieldwork. This article is an attempt to unravel these factors and their influences on our current historical understanding of India’s Lower Palaeolithic record. Furthermore, it aims to suggest possible avenues that can help level the playing field of archaeological research in India through the use of accessible datasets and Geographical Information Systems (GIS) as a tool to mitigate barriers of cost and language.
2. Issues with Geocoding Legacy Datasets
There is immense intrinsic value in geocoding and enhancing legacy datasets such as the historical one used in this study, Indian Archaeology – A Review. It not only provides a rich and valuable record of archaeological data from which current researchers can draw information but also allows for a re-interpretation of older data, and transforms that data into a medium that can be made more accessible to everyone. Geocoding is also required to map the Palaeolithic of India because less than 1% (n = 4) of the 1,535 sites in the current study had geocoordinates in the literature. However, there are multiple additional issues that arise when geocoding historical datasets that must be adequately dealt with to generate usable and accurate results. At the moment, ESRI only appears to support geocoding in India under Level 3 (https://developers.arcgis.com/rest/geocode/api-reference/geocode-coverage.htm) which leads to less accurate results such as these. There is also limited functionality offered by ESRI for South Asian vernacular languages, which causes issues we will explore in this section.
Lower Palaeolithic sites in India were discovered from 1860s onwards but the IAR record ranges from the 1950s onwards. Since then and until the present day, a multitude of political, administrative, cultural and geographic changes have taken place, including variations of place names and spellings, shifting political boundaries and the creation of new districts and states. The organisation of Indian states is also partly culturally based (Singh 2008), and the associated linguistic diversity further complicates the process of geocoding. At the end of the Colonial period, right before independence, India was largely divided into regions of three broad categories: 1. Territories of the British Empire. 2. Princely states governed by Indian royal families, and 3. Territories of France and Portugal (Nehru 1963; Copland 1991).
At the time of independence, India was divided into 17 provinces that were under the British government and were either going to become part of the freshly formed independent India or join the newly created country of Pakistan (Copland 1991). Political boundaries were unstable during the first few years of independence. Indeed, India became a sovereign democratic republic in 1950 and, by this time, five more erstwhile princely states (Hyderabad, Jammu & Kashmir, Manipur, Junagadh, and Tripura) had joined the Indian Union (Sherman 2007; Haokip 2012; Mohan 2012; Gangte 2013; Ankit 2018).
The state of Andhra Pradesh separated from Madras state to become a state dominated by the Telugu language in 1956 (Singh 2008). Following this, the States Reorganisation Commission (SRC) was created to assess whether there would be any potential benefit in further restructuring the states along linguistic lines. The States Reorganisation Act of 1956 made further changes amongst existing states along linguistic lines, dividing a few states (e.g. Kerala separated from Mysore state) and made a few states into Union territories (e.g. Chandigarh) (Singh 2008). Gujarat became independent from the then Bombay state (now Maharashtra state) in 1960. The 1970s, 1980s, 2000s and 2010s were no less active in the creation of new states (e.g. creation of Manipur and Tripura, Jharkhand and Telangana) (Singh 2008; Haokip 2012). As of March 2022, India has a total of 28 states and 9 Union Territories, with the most recent reorganization taking place in 2019 when the state of Jammu & Kashmir was split into the Union territories of Jammu & Kashmir and Ladakh. This illustrates how dynamic India’s political boundaries are, and therefore careful consideration is required when historically recorded data points (archaeological and otherwise) are utilised in the present day.
This is not only applicable at the state level, but also at the district or town level. For instance, the 1957–1958 IAR records an entry with 13 Lower Palaeolithic sites discovered by S.A. Sali in the District of West Khandesh, in the then state of Bombay. The entry seems straightforward enough before getting into its intricacies. First and foremost, which version of “Bombay” does this entry allude to? Is it Bombay province as the British made it, or the immediate post-Independence Bombay state? Is it Bombay before or after it was split along linguistic lines in 1960? Some of these ambiguities are easy to untangle. Since this entry is from 1957–58, it was referring to a unified Bombay that contained the present states of Maharashtra and Gujarat.5 In the present day, however, within which political boundaries would these sites be located: present-day Maharashtra or Gujarat? This can (theoretically) be simply resolved by looking up the location of the District of West Khandesh and seeing which state boundary it falls under. However, that district no longer exists, its name was abandoned by the government and later renamed. The West Khandesh District came into existence in 1906 when the administrative division of Khandesh was bifurcated into its eastern and western parts (Government of Maharashtra, 2022). West Khandesh was later renamed Dhule District and it is presently located in the state of Maharashtra (Government of Maharashtra, 2022). Therefore today, the aforementioned 13 Lower Palaeolithic sites can now be unequivocally located and mapped in the Dhule District, Maharashtra. Geocoding services such as the one used here do most of the work; nonetheless, an awareness of the region’s political history and dynamics is still extremely important, as illustrated in this example, where there might be overlapping and/or unmatched cases.
Another essential factor to be considered while working with these types of datasets is linguistic complexity, especially considering that most Indian languages use a non-Roman script such as Devanagiri or Gaudi (Salomon 2003). There is immense linguistic diversity within the Indian Subcontinent, with 22 official languages, 122 formally recognized (and many more regional) dialects and variants. The IARs have always been written in English, a practice that continues till this day due to its colonial legacy. However, the drawback of using English is that when it comes to data collection, it often fails to capture the complexities of local languages as well as the spellings and meanings of specific place names.
During archaeological exploration in India, discovered sites are usually given the name of the nearest village or town and these names are generally established in any given regional language. When translated into or spelled in English, they prove to be inaccurate as English often does not contain specific sounds that can do justice to the local iteration. For instance, the Palaeolithic site of “Yadhoda” that was reported in the IAR volume 1957–58 by S.A. Sali in District West Khandesh, Bombay state (present day Dhule District, Maharashtra). This entry was “Unmatched” during the geocoding exercise with 51 possible sites within the state of Maharashtra itself (Figure 3). All the tied names were variants of the name “Yadhoda”, such as either “Vadhoda” or “Yashoda”. There was no direct match, and the choice while rematching this site was between keeping the “Y” sound at the beginning of the site name or the ‘dh’ sound in the middle. This disparity can be attributed to two reasons, both of which demonstrate equally important but different issues with using English while recording data in this context. It can be because it is difficult to capture a particular sound or local way of pronunciation in the English language in an authentic way, leading to an inaccurate representation of the actual site name. Alternately, it can also be due to an evolution of the name itself from when it was first recorded to the present day. There certainly are patterns in language evolution that might assist in narrowing down possibilities. However, that would require an intimate knowledge of the linguistic development of the Marathi dialect spoken in Dhule District which the authors do not possess. In this case, the option that was closest to the location of Dhule District was chosen as the most likely geocoded match. This further highlights the importance of keeping local realities firmly in the centre while transmuting data into spatial systems.

Figure 3
A screenshot of ArcGIS Pro showing the 51 possibilities for the unmatched site of Yadhoda.
Additional problems also emerge when multiple town and villages share the same name (formally or informally), especially those located in the same state or general region. The thirty-five tied possibilities for the Lower Palaeolithic site of Sangam, Bombay state (present day Maharashtra) is an apt example of this problem (Figure 4). This site was originally reported in the 1955–56 IAR by H.D. Sankalia and S.B. Deo. In ArcGIS, not only were all thirty-five possibilities located within Maharashtra, but they were also widely distributed within the state, making possible sites very far apart from one another. In this case, the original entry mentioned that the site was ‘9 miles from Nevasa’. Therefore, the ‘Sangam’ closest to the town of Nevasa was chosen. A similar solution is not always possible for all cases as often entries do not mention any specific landmarks or geographic markers.

Figure 4
A screenshot showing the 35 tied matches for the site of Sangam in Maharashtra, and the extent of their distribution within the same state.
The time period that our utilised locational data is coming from (1950s–1970s) means that the accuracy of the Palaeolithic sites varies considerably. There are cases when some sites are recorded in IAR without even mentioning their associated districts, and other inverse cases when specific site locations are given along with distinct local markers. For example, the site of Ganja Pahar was mentioned to be located ‘17 km north of the Grand Trunk Road near Nirsha’ (Jayaswal: IAR 1961–62, p. 4). As is clear from these examples, the locational precision of sites compiled from these volumes cannot simply be classified as accurate or inaccurate; rather, they fall on a spectrum of locational integrity that must be carefully considered when engaging with this dataset.
Despite the challenges imposed by use of a Western non-local language to record the data, after geocoding, the data can be enhanced using other languages to make the record more accessible to the international academic community. While the remote structure and usage of GIS makes it inherently disconnected from local realities, integrating a vernacular sensitivity to the use of GIS would make it much more accurate and usable. It also highlights the importance of the human element that is still required in an activity such as geocoding, to verify that the data is correct. This exercise also demonstrates that geocoded points are not always authoritative but represent the most likely possibility.
3. Patterns in the Palaeolithic or Sampling Bias? – Understanding Post-Independence Research Traditions
The geocoding of these sites is extremely beneficial, especially when it comes to visualization of data and understanding the spatial relationships between Lower Palaeolithic sites. There are various clusters of sites that emerge from this data (see Figure 5 for state wise breakdown of geocoded locations), although their densities and locations are likely to change once more comprehensive mapping is carried out (in progress). This raises the question, does the distribution of these sites reflect actual spatial relationships between Lower Palaeolithic sites or are they a result of preferential sampling by past researchers? Presumably both: all the archaeological sites present on our generated map (Figure 3) are those that were discovered in the 1950s through 1970s in India, therefore this dataset is not only representative of Lower Palaeolithic patterns but also of the research history in play during this specific time period. This period left India to grapple with many legacies of the Empire, a nation divided along religious lines, a struggling economy, and many other socio-political issues. Considering these factors allows for a deeper understanding of preferential sampling at play during this time, allowing a more well-rounded interpretation of the data.6 Any major clusters or blank spots are not just a function of Lower Palaeolithic preference but also a function of socio-economic differences and educational history. Archaeological research history varies from state to state and often reflects many state level differences of economics, ease of social mobility, access to education and regional politics (Figures 5 and 6).

Figure 5
Distribution of reported Lower Palaeolithic sites per Indian state in the volumes of IAR from the 1950s to the 1970s.

Figure 6
Distribution of Lower Palaeolithic sites found and reported between 1952 and 1976 in IARs according to each Indian state.
This section of the paper will focus on the regions of India with the most noticeable clusters of Lower Palaeolithic sites from this preliminary study, as well as significant absences of sites and aims to evaluate the potential reasons for these disparities. The goal is to balance an archaeological view of this distribution (e.g. looking at raw material sources and geology), with keeping the many socio-political reasons that affected their discovery at the forefront as well. To understand the disparity between numbers of discovered Lower Palaeolithic sites between states during this time, the per capita Net Domestic Product (NDP) of different states from the 1960s to the early 1980s was analysed, which showed stark differences in state economies. A detailed analysis of all gaps and clusters in the distribution is beyond the scope of this essay, so this section will mainly focus on the following attributes: i) the large cluster of sites in Maharashtra; ii) the lack of sites in the Ganga river basin, and iii) the cluster of sites in Madhya Pradesh.
One of the most obvious clusters is the one in the state of Maharashtra. If the geology of the state is considered, this large cluster is counterintuitive. The state of Maharashtra is part of peninsular India, which is distributed with rocks that were exposed during the Quaternary period. The Deccan Trap forms a large part of Maharashtra’s geology, which is largely dominated by basaltic lava flows (Pande 2002). The vast majority of the Lower Palaeolithic assemblages from this region are made of basalt (Misha 2007). Basalt is one of the most weatherable rocks and often does not survive various terrestrial and fluvial site formation processes, including prehistoric archaeological assemblages on basalt (see Mishra 1982). Nonetheless, it is interesting that so many Lower Palaeolithic artefacts and assemblages have been discovered in a region that does not have the perceived ideal conditions for their preservation. It is unclear to what degree weathering affects the preservation of assemblages on such rock types. Long-term geomorphological studies are required to the extent of correlation between weathering and assemblage condition/preservation, if there is a meaningful one at all. Even though at first glance the Lower Palaeolithic record of Maharashtra looks very rich, in case weathering is a prominent negative factor in artefact preservation, then it is still a small percentage of the original pre-depositional archaeological record. A contributing factor to this rich record is the high concentration of research interest that Maharashtra has received due to the presence of one of the oldest and most active archaeological research institutes and departments in the country, the Deccan College Postgraduate and Research Institute. The Deccan College was originally established as Hindoo College in 1821 by the then Governor Mountstuart Elphinstone under the Colonial British government in Bombay presidency and it is also one of the oldest educational institutes established in the country (Sharma et al. 1990). Professor H.D. Sankalia, who joined Deccan College as a lecturer in 1939, had an immense impact on the field, with many flourishing fieldwork campaigns and excavation projects happening under his supervision (Dhavalikar 1990). One such site is Nevasa in Maharashtra, which yielded a continuous sequence from the Lower Palaeolithic to the Medieval period (Sankalia et al. 1963). Professor B. Subbarao, who established the Department of Archaeology and Ancient History at The Maharaja Sayajirao University of Baroda, Gujarat in the early 1950s, also studied at Deccan College. He worked prolifically in Maharashtra and Gujarat. Gujarat also has quite a high number of sites discovered, though not as high as Maharashtra. These examples illustrate the significant impact the location of Deccan College and The M.S. University of Baroda had on the many sites that were discovered in the combined western region of India. This large cluster of sites also represents the concerted efforts of the many faculty members and students who would have undertaken extensive field explorations during their academic careers at these institutions. Therefore, it is unsurprising that such a large number of sites were found in these areas despite geological conditions not being ideal for long-term preservation. It is also worth noting that both Maharashtra and Gujarat were performing significantly above the all-India-average when it came to economic growth in the period after Independence (Dandekar 1988). In the 1950s and 1960s, both states had high economic growth rates that stabilised in the 1970s and 1980s but still remained higher than other Indian states (Dandekar 1988).
The next area we would like to highlight is the Ganga Basin, which shows a remarkable lack of sites. This river basin in geographically extensive, roughly traversing the states of Uttar Pradesh, Uttrakhand, Madhya Pradesh, Chattisgarh, Bihar, Jharkhand, West Bengal, and parts of Haryana and Delhi. The Ganga basin lies on the Indo-Gangetic plain, which is also the area where many of the Bronze Age sites of the Indus Valley Civilization are located (Kenoyer 2006). The Ganga basin has been fluvially active throughout the Quaternary and therefore there is a constant deposition of silt in the Ganga floodplain (see Pant and Sharma 1993; Singh 1996). This has presumably caused the oldest prehistoric evidence, i.e. the Lower Palaeolithic record, in this region to be deeply buried under the alluvium, making it difficult to find such sites (Misra 2001). However, this methodological disadvantage in the region is also compounded by the weak economic position and unstable social and political conditions of the concerned states, whilst not being the only factor.
The states of Uttar Pradesh and Bihar (Bihar at the time included the modern-day territory of Jharkhand state, which was created in 2000) were two of the economically lowest performing states in the period following Independence, well below the all-India-average for per capita NDP (Dandekar 1988). Their relative position only declined with time and they remained at the bottom of the list with even greater economic disparity in the 1980s. To illustrate the stark differences between the positions of the states, the national average growth rate of India was 3.57% in the 1980s while Uttar Pradesh had an average growth rate of only 3.05% and Bihar a mere 2.41%. Both states not only had a low growth rate but also a dwindling rate of population growth.
A few eastern states in the Ganga basin region also come under the ‘Red Corridor’/‘Naxal Belt’ of India and have been plagued with intermittent but continuous insurgency movements. The Naxal Belt is comprised of the states of Andhra Pradesh, Bihar, Chhattisgarh, Jharkhand, Madhya Pradesh, Maharashtra, Odisha, Telangana and West Bengal. There is a close relationship between the development of insurgency and neglected tribal populations (Jaiswal 2020). The origin of Naxalism can be traced back to an incident in 1967 in Naxalbari village, West Bengal, in which a group of local tribals and other ‘backward caste’ cultivators rose up against feudal practises of exploitation that involved the denial of their share in agricultural produce and the payment of fair wages by upper caste landlords (Jaiswal 2020). This protest became an expanded people’s movement in subsequent years. The Naxal movement was declared a terrorist organisation by the government of India in 1967. This charged atmosphere is a deterrent for anyone who wants to conduct fieldwork in these states even if historical tensions have calmed down. For this reason, many unexplored areas (but with high research potential) within these states remain largely inaccessible and/or risky to work in. The few sites in this region that are present on our map are those found in the Son and Belan River valleys, where Allahabad University undertook extensive work under the supervision of G.R. Sharma (Misra and Nagar 1973).
Another region with a high number of reported sites is the state of Madhya Pradesh in Central India. Economically, Madhya Pradesh was a weak position during this time. It was below the all-India-average of per capita NDP in the post-Independence period and its economic growth further slowed down in the 1960s and 70s due to a burgeoning population and low economic growth (Dandekar 1988). It is also another region affected by the Naxalite movement; however, it still has a comparatively higher density of Lower Palaeolithic sites. The Narmada basin extends across the states of Gujarat, Maharashtra and Madhya Pradesh, with large parts of it located within Madhya Pradesh. It is rich in Palaeolithic evidence; the only known early hominin fossil in South Asia comes from this region at Hathnora (Sonakia and Biswas 1998). It is considered an important area for Palaeolithic studies due to the conducive fossil preservation conditions compared to most other parts of the country (barring the Siwalik Hills and Son Valley). A.P. Khatri was one of the first researchers who drew attention to this area in 1962 when he reported Oldowan/Mahadevian artefacts from the site of Mahadeo-Piparia (IAR 1962); this has since been revised (see Supekar 1968) and the unique techno-morphological nature of its lithic assemblages has been addressed by researchers through new field visits and specimen collections (e.g. Chauhan 2009b). The example of Madhya Pradesh shows how economic and political reasons are not always consistently applicable in the geographic distribution and density of sites; there can be areas that have a high number of sites despite the region’s socio-economic-political issues.
Keeping these biases and socio-economic differences in mind will allow for more well-rounded and nuanced interpretations of the historical Palaeolithic record in India. Spatial gaps and dense clusters can be better understood within the wider modern context of when and how they were discovered rather than assuming that they depict an actual pattern of Palaeolithic occupation; indeed, numerous zones still require systematic surveys and many regions preserve buried evidences that remain invisible or inaccessible for mapping and study. The factors mentioned above might only be partly applicable in some cases, and thus should be thoroughly evaluated before making any definite claims. There is not always a link between the NDP and research attention in all regions. Even if the correlation appears strong, there isn’t an unequivocal link. We merely wish to draw attention to the many socio-economic factors that often are influencing these results in the background. It is not possible to distinguish bias vs. true patterns everywhere, and it is even more difficult to do when working with low resolution data. A more thorough examination of contributing factors can only take place when data indicating true presence and absence of sites is available more widely, and patterns can be teased out of such data at political and ecological levels. This will allow more thorough historical and archaeological interpretations of the South Asian Palaeolithic records. It will also facilitate discourse on specific regions within South Asia that have not yet been well explored and lead to more productive and geographically and contextually unbiased fieldwork.
4. GIS: Applications for an Emerging India
The previous sections have focused on the specific problems that arise from geocoding historical, non-Western datasets and how any spatial patterns that emerge cannot be interpreted at face value (even after comprehensive mapping of all known sites in the future). Considering the socio-economic disparities and barriers of access that have affected archaeological research in India, the increased use of GIS is one possible way to mitigate these barriers. GIS is not just a tool for spatial thinking, but also for encouraging and sustaining long-term multidisciplinary research. Its proper use can help overcome many cost- and skill-related hurdles that researchers have faced in the past, and level the playing field for archaeological research. It is also a cost-effective strategy for India, where, as a developing country, resources are often scant for obtaining new or verifying published locational data in the field. Cultural resource management and development are relatively low priorities for such countries with limited funding and for prehistoric research, even lower. For instance, from all the Lower Palaeolithic sites discovered between this period, only 2.9% (n = 46) were excavated, based on IAR information. Lack of proper dating, limited stratified deposits and a low number of excavations have made the Palaeolithic record of India difficult to study, interpret and publish; as a result, it is often missing in most published narratives on global human evolution and dispersals (Figure 7). Relative to its geographical size, the quantum of material culture from the Palaeolithic era remains comparatively low (James and Petraglia 2005; Chauhan 2009a; 2009b). In that regard, GIS can help narrow down where to commit resources, and can help make informed decisions about where to conduct field explorations or excavations. It can also help explore areas remotely that are otherwise difficult to access and survey currently.

Figure 7
Location of Lower Palaeolithic sites in South Asia discovered between 1952–1976 and reported in IAR, both excavated and unexcavated.
Traditionally, GIS is considered an important tool for spatial analysis of various types of data, including archaeological data. With the evolution of the digital landscape, GIS results are taking on an increasingly social dimension. Data documentation and sharing has inherent social and political dynamics, with degree of access often limited by languages barriers and technological barriers. In the case of the IARs, the nature of the record is further diminished by the fact that local languages were not used for data collection and documentation. This not only causes issues of accuracy as demonstrated earlier, but also excludes many of India’s populations from participating in archaeological research as they do not have the required knowledge to read or write in English. English remains one of the most widely spoken languages in India, but access to education in English is not consistent across social and economic divisions. The use of the English language in India is a colonial legacy and is not class neutral (Borooah & Iyer 2005; Chaudhary 2015). It is deeply rooted in the power structures of the British Colonial era, as often only the upper classes/castes had the resources to educate themselves in English (Annamalai 2003).
The idea of using GIS as a media tool has increasingly taken shape, especially with the growth of volunteered geographic information (VGI), the GeoWeb and Web-based mapping (Sui and Goodchild 2001; Sui and Goodchild 2011). Its further development can immensely help these problems of access through delivering information in a format people can use and making data accessible in local languages, especially with the internet creating a welcoming avenue. This research solely uses ESRI-based ArcGIS World Geocoding Service that requires a subscription (such as ArcGis Online), potentially preventing other users from replicating or expanding the proposed workflow into different regions or chronological times. The decision to use ESRI was due to the author’s familiarity with the software. For stakeholders in cultural heritage and young South Asian researchers, access to open-source GIS platforms is very relevant. This geocoding process can also be replicated in QGIS, which is free to use at a global level,7 and in India Place Finder,8 albeit the exact ESRI algorithm is proprietary and there will be minor differences between geocoding services.
This ties in with the importance of databases, so that other people can begin using this information for their own research purposes. The successful use of databases has been showcased by multiple projects, such as the J.J. Wymer archive, The Atlas of Hillforts project and Grave Goods project (Pouncett 2019; Cooper et al. 2021). For South Asia, new locational studies and databases which utilise remote sensing, mapping and/or GIS are increasing for interpretations of the known archaeological records as well for predictive modelling (e.g. Roberts et al. 2021; Berganzo-Besga et al. 2023) including for the Palaeolithic record (Pappu et al. 2010; Sukumaran et al. 2023; Chauhan et al. in progress).
The term ‘legacy data’ suggests that this data must be processed and modified before it can be properly utilised in a digital environment (Allison 2008). India has a high quantity of such legacy data with valuable archaeological information in the form of the IARs and other comparable journals established in pre-Independence times, that have the potential to be great assets to archaeological researchers focusing on South Asia. Working with the nuances of recontextualising such information for present use is a problem that is not unfamiliar to archaeologists and historians (see Wylie 2017; Bogdani 2020; Fitton et al. 2023; Fletcher 2023). Petrie et al. (2018) have established a reliable workflow for georeferencing historical Survey of India maps that can be used to detect archaeological mound features with the help of modern GIS software in India and Pakistan (see also Green et al. 2019). There is also a possibility of incorporating machine learning to extract features from these historical maps (Orengo et al. 2020; Garcia-Molsosa et al. 2021). These studies demonstrate that while the usage of legacy data for archaeological purposes comes with its challenges, they are nevertheless an important and underutilised resource for India and Pakistan. Similar historical datasets can be explored from other South Asian countries and may yield equally valuable information.
The widespread use of GIS and existing archaeological and paleoanthropological databases (e.g. Kandel et al. 2023) can greatly enhance the quality and quantity of information available for the South Asian Palaeolithic; the method presented in this paper can also be used for records of younger time periods (i.e. Mesolithic, Neolithic, protohistoric and Historical). It will allow an avenue of access for people and communities who have been historically and intellectually marginalised and also establishes a tradition of data sharing that significantly facilitates continuity of such research on Old World palaeoanthropology.
5. Conclusion
The history of archaeology in India is intrinsically interwoven with a series of complex and long-standing economic and socio-political processes, in many instances stemming from a legacy of imperialism and Colonial rule. Accounting for the ways in which these socio-economic and political factors affected (directly or indirectly) research traditions is essential to understand the cultural and spatial dimensions of archaeological investigations and datasets in India. These legacies of ‘Empire’ come to the forefront even through the performance of a simple spatial exercise of geocoding an archaeological dataset, particularly because of the English language in archaeological documentation. A holistic interpretation of the distribution patterns of Palaeolithic/prehistoric sites in India is only possible if the historical biases in regional coverage and data quality that emerge from these socio-political contexts are also kept at the forefront, since resource inequality is a major driver of archaeological research imbalances between different Indian states and other political and administrative boundaries. Spatial distributions of Lower Palaeolithic sites in India are not merely showing patterns of early hominin land use, ecological adaptations and geographic occupation, but are also providing clues to the different types of inequalities inherent in the archaeological education and research of the country. GIS is therefore an important methodological asset in bringing these inequalities to the forefront through data visualization. Essentially, we have attempted to maximize the historical and analytical values of a highly challenging legacy dataset from one of the richest archaeological regions in the world and hope it will lead to analyses of comparable data from other archaeological time periods and regions. Furthermore, the creation and curation of accessible databases in different languages, including local vernaculars, will better facilitate both inter-regional and intra-regional collaboration while simultaneously alleviating barriers at the data coding stage to community-led research. At the moment, ESRI only appears to support geocoding in India under Level 3 (https://developers.arcgis.com/rest/geocode/api-reference/geocode-coverage.htm) which leads to less accurate results, as we have demonstrated earlier in this paper. In that respect, alternate methodological and interpretative solutions needed to be sought as we have attempted to demonstrate in this study. We are not attempting to replace the processes of ground-truthing and cross corroboration in this paper, instead to demonstrate the ways computational archaeology and GIS can be bought into the study of South Asian archaeology. Such endeavours can also be significantly beneficial in decolonising historical interpretations and conceptual frameworks in South Asian archaeology and palaeoanthropology. The necessity for a defined format and process for describing and reporting archaeological sites in India was also unintentionally brought to light by this early analysis of historical heritage data, something that is still absent in modern archaeology practises, and a contributing element of decolonisation. Finally, we anticipate that this study will encourage students and researchers to be cautious when interpreting historical contextual data and pursue more accurate mapping of archaeological sites retrieved from such records.
Data Accessibility Statement
Data supporting this study is openly available from Zenodo at https://zenodo.org/doi/10.5281/zenodo.13627361.
Notes
[3] The ASI/NMMA website has PDFs of IAR only from 1953 to 2001; subsequent issues are available only in hard copy and the most recent issues are currently backlogged in publication. Due to time constraints, the authors were able to compile and analyse data only up to 1976 for the purpose of a conference presentation. The compilation and analysis of the remaining sites is ongoing and will eventually include all IARs (online and hard copies), other journal articles, books, book chapters, unpublished dissertations and other obscure literature, to eventually obtain a comprehensive geographic perspective on all sites belonging to all three Palaeolithic periods (Lower, Middle and Upper).
[5] It is important to note that this appreciation and knowledge of the historical implications will vary significantly between non-Indians and Indian researchers, the latter being more intimately familiar with the political/administrative changes made in the past.
[6] Another historical example of this is the race or motivation to locate new sites of the Indus Valley Civilization (IVC) in both India and Pakistan, following the partition of the latter from the former, when two of the most iconic IVC sites – Harappa and Mohenjo Daro – went to Pakistan (see Kumar 2020).
Acknowledgements
The authors would like to thank Dr. John Pouncett (University of Oxford) and Jonathan Lim (University of Arkansas) for their valuable feedback and input in this manuscript.
Competing Interests
The authors have no competing interests to declare.
Author Contributions
V.V. and P.R.C. conceived of the presented idea. V.V. compiled the data, conducted the computational work, and developed the theory. P.R.C. verified the theory, added important insights, supervised and edited the manuscript. Both authors discussed the results and contributed to the final manuscript.
