Have a personal or library account? Click to login
Computer-Assisted Language Comparison: State of the Art Cover

Computer-Assisted Language Comparison: State of the Art

Open Access
|May 2020

Figures & Tables

johd-6-12-g1.png
Figure 1

An overview of the workflow.

johd-6-12-g2.png
Figure 2

The geographic distribution of the Hmong-Mien languages selected for our sample.

Table 1

A minimal example for four words in four Germanic languages, given in our minimal tabular format. The column VALUE (which is not required) provides the orthographical form of each word [20, 21].

IDDOCULECTCONCEPTVALUETOKENS
1Englishhousehouseh aʊ s
2GermanhouseHaush au s
3Dutchhousehuish ʊɪ s
4Swedishhousehush ʉː s
johd-6-12-g3.png
Figure 3

An example to illustrate the usage of orthography profiles to tokenize the phonetic transcriptions.

johd-6-12-g4.png
Table 2

The transformation from raw to machine-readable data. As illustrated in Table 1, the VALUE column displays the raw form. The tokenized forms are added to the TOKENS column.

johd-6-12-g5.png
Figure 4

The comparison of full cognates (COGID) and partial cognate sets (COGIDS). While none of the four words is entirely cognate with each other, they all share a common element. Note that the IDs for full cognates and partial cognates are independent from each other. For reasons of visibility, we have marked the partial cognates shared among all language varieties in red font.

johd-6-12-g6.png
Figure 5

The alignment of ‘sun’ (cognate ID 1) among 4 Hmong-Mien languages, with segments colored according to their basic sound classes. The table on the left shows the cognate identifiers for cognate morphemes, as discussed in Figure 4. The table on the right shows how the cognate morphemes with identifier 1 (basic meaning ‘sun’) are aligned.

johd-6-12-g7.png
Figure 6

Illustration of the template-based alignment procedure. a) Representing prosodic structure reflecting syllable templates for each morpheme in the data. b) Aligning tokenized transcriptions to templates, and deleting empty slots.

johd-6-12-g8.png
Table 3

Examples of compound words in Hmong-Mien languages. The column MORPHEMES uses morpheme glosses [31] in order to indicate which of the words are cognate inside the same language. The form for ‘net’ in the table serves to show that ‘bed’ and ‘net’ are not colexified, and that instead ‘fishnet’ is an analogical compound word.

johd-6-12-g9.png
Table 4

Two glosses, ‘son’ and ‘daughter’, in [8] are displayed here as an example to compare the differences between cognates inside and cognates across meaning slots.

johd-6-12-g10.png
Figure 7

Compare alignments for morphemes meaning ‘son’ and ‘daughter’ as an example to illustrate how cross-semantic cognates can be identified. The cognate sets in which the forms in the languages are identical are clustered together and assigned a unique cross-semantic cognate identifier (CROSSID). Those which are not compatible as the cognate sets 2 and 1 in our example are left separate.

Table 5

An example of correspondence sets in the classical literature, following Ratliff [11, p. 75], reconstructed forms for Proto-Hmong-Mien are preceded by an asterisk.

1234567891011
blood
[*ntshjamX]
ɕhaŋ³ȵtɕhi³ɳtʂha³ntsua³ᵇnʔtshenᴮθi³ȵe³ɕam³saːm³san³dzjɛm³
head louse
[*ntshjeiX]
ɕhu³ȵtɕhi³ɳtsau³ᵇntsɔ³ᵇnʔtshuᴮtɕhi³ɕeib³tθei³dzɛi³
to fear/be afraid
[*ntshjeX]
ɕhi¹ɳtʂai⁵ntse⁵ᵇnʔtsheCɳtʃei¹ȵɛ⁵dʑa⁵ȡa⁵’ȡa⁵dzjɛ⁵
clear
[*ntshjiəŋ]
ɕhi¹ɳtʂia¹ntsæin¹ᵇnʔtsheAnɪ̃¹dzaŋ¹
Table 6

A summary of the result of the sound correspondence pattern inference algorithm applied to our data. The numbers below each item are the quantities of sound correspondence patterns detected at each position in the syllables.

Position‘Regular’ PatternsSingletons
Initial165106
Medials4523
Nucleus21357
Coda6613
Tone16429
Total653228
johd-6-12-g11.png
Table 7

Cells shaded in blue indicate the initial consonants belonging to a common correspondence pattern, with missing reflexes indicated by a Ø.

DOI: https://doi.org/10.5334/johd.12 | Journal eISSN: 2059-481X
Language: English
Published on: May 22, 2020
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2020 Mei-Shin Wu, Nathanael E. Schweikhard, Timotheus A. Bodt, Nathan W. Hill, Johann-Mattis List, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.