Have a personal or library account? Click to login
A Global Lexical Database (GLED) for Computational Historical Linguistics Cover

A Global Lexical Database (GLED) for Computational Historical Linguistics

By: Tiago Tresoldi  
Open Access
|Feb 2023

Figures & Tables

Table 1

Number of doculects per number of concepts expressed in absolute and relative terms. Note that the number of entries for a doculect will be higher than the number of concepts in the case of synonyms.

NUMBER OF CONCEPTSDOCULECTSPERCENTAGE OF DOCULECTS
303305.0
313064.7
323615.5
334016.1
345959.1
356279.5
3678612.0
376059.2
386279.5
3973611.2
40119818.2
johd-9-96-g1.png
Figure 1

Location of the doculects included in the dataset, using information from Hammarström et al. (2022); colours are automatically assigned to differentiate language families.

Table 2

Absolute and relative doculect coverage per concept, along with the Concepticon mapping for each concept.

CONCEPT GLOSSDOCULECTS (RATIO)CONCEPTICON NAME/ID
1pl5265 (0.801)WE/1212
1sg5379 (0.818)I/1209
2sg5231 (0.795)THOU/1215
blood6426 (0.977)BLOOD/946
bone6351 (0.966)BONE/1394
breast5957 (0.906)BREAST/1402
come6130 (0.932)COME/1446
die6125 (0.931)DIE/1494
dog6430 (0.978)DOG/2009
drink6058 (0.921)DRINK/1401
ear6475 (0.985)EAR/1247
eye6494 (0.988)EYE/1248
fire6417 (0.976)FIRE/221
fish6226 (0.947)FISH/227
full4190 (0.637)FULL/1429
hand5693 (0.866)HAND/1277
hear5898 (0.897)HEAR/1408
horn4317 (0.656)HORN (ANATOMY)/1393
knee5357 (0.815)KNEE/1371
leaf6077 (0.924)LEAF/628
liver5454 (0.829)LIVER/1224
louse5711 (0.868)LOUSE/1392
mountain5321 (0.809)MOUNTAIN/639
name6042 (0.919)NAME/1405
new5711 (0.868)NEW/1231
night6289 (0.956)NIGHT/1233
nose6404 (0.974)NOSE/1221
one6296 (0.958)ONE/1493
path6151 (0.935)PATH/2252
person5552 (0.844)PERSON/683
see6104 (0.928)SEE/1409
skin6182 (0.940)SKIN/763
star6220 (0.946)STAR/1430
stone6290 (0.957)STONE/857
sun5877 (0.894)SUN/1343
tongue6430 (0.978)TONGUE/1205
tooth6399 (0.973)TOOTH/1380
tree5850 (0.890)TREE/906
two6285 (0.956)TWO/1498
water6413 (0.975)WATER/948
Table 3

A modified snippet from the lexical dataset, showing the most critical columns for a subset of Tupian words for the concept “dog”. The data includes a unique language name, a Glottocode (when available), the family name, a concept gloss derived from the Concepticon catalog, the phonological transcription of the word, the phonological alignment of the word in its cognate set (with hyphens indicating gaps), and a cognate set index.

LANGUAGECODEFAMILYCONCEPTFORMALIGNMENTCOGSET
Achéache1246TupianDOGbɐegib ɐ e g i16
Amundavaamun1246TupianDOGɲɐɲwɐrɐɲ ɐ ɲ w - ɐ r ɐ17
Avá Canoeiroavac1239TupianDOGjɐwɐrɐj ɐ - w - ɐ r ɐ17
Paraguayan Guaranípara1311TupianDOGdʒɐgwɐdʒ ɐ g w - ɐ - -17
Kaiwákaiw1246TupianDOGjɐgwɐj ɐ g w - ɐ - -17
Eastern Bolivian Guaraníeast2555TupianDOGjeimbɐj e - i m b ɐ19
Tapietétapi1253TupianDOGɲɐʔəmbɐɲ ɐ ʔ ə m b ɐ19
Cinta Largacint1239TupianDOGɐwəliɐ w ə l i20
Gavião Do Jiparanágavi1246TupianDOGɐvələɐ v ə l ə20
johd-9-96-g2.png
Figure 2

A neighbour-net for the Tupian languages in the dataset, plotted with SplitsTree v4 (Huson & Bryant, 2006).

johd-9-96-g3.png
Figure 3

The “global” language tree from the combined Bayesian MCMC phylogenetic inferences, plotted with iTOL (Letunic & Bork, 2021).

Table 4

Distance between Swedish (swed1254) and other languages, as computed using the Neighbour Joining trees (NJ, from zero to infinite), the Bayesian trees (B, from zero to 4.0), and the normalized Bayesian trees (NB, from zero to 1.0).

LANGUAGE (GLOTTOCODE)NJBNB
Norwegian Bokmål (norw1259)0.210.110.02
Danish (dani1285)0.240.020.01
Dutch (dutc1256)0.411.400.35
English (stan1293)0.421.400.35
Italian (ital1282)0.841.600.40
Hindi (hind1269)0.901.950.48
Hittite (hitt1242)0.901.970.49
Basque (basq1248)4.001.00
DOI: https://doi.org/10.5334/johd.96 | Journal eISSN: 2059-481X
Language: English
Published on: Feb 2, 2023
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2023 Tiago Tresoldi, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.