Abstract
Maps depicting the geographic location of languages are essential tools for linguistic research. Although many language maps are available in the scientific literature, most encode spatial information as static images, often on paper. In contrast, geographic databases store languages as georeferenced digital data, allowing integration with other datasets, quantitative geographic analyses, and mapping. At present, there is no open-access platform providing digital language areas. To address this limitation, we introduce Glottography, a free and open geolinguistic data platform for mapping the world’s languages. Glottography represents the speaker areas of the world’s languages as georeferenced spatial polygons, enriched with relevant metadata, including Glottocodes that link each polygon to a unique identifier in Glottolog, a database cataloguing the world’s dialects, languages, and language families. Glottography currently includes more than 13,000 language areas of 5,300 distinct languages, digitised from 29 source publications. For each source, the platform provides the data in its raw, unmodified form and aggregated at the levels of languages and language families, according to the classification in Glottolog. Glottography is accessible through Rglottography, an R package, and is accompanied by detailed tutorials for usage and data acquisition that encourage users to contribute new geodata to the platform. Being the first open data source of its kind, Glottography enables computational analyses that explore the origins, distribution, and drivers of global linguistic diversity.
