Crosslinguistic Semantic Textual Similarity of Buddhist Chinese and Classical Tibetan

Rafal Felbur; Marieke Meelen; Paul Vierthaler

doi:10.5334/johd.86

Figures & Tables

Pipeline for overall procedure of cross-lingual Buddhist Chinese & Classical Tibetan alignment.

Table 1

Summary of cosine similarity scores of Tibetan-Chinese glossary pairs within the new embedding spaces according to Chinese tokenisation method. Shows the highest scoring pair, lowest scoring pair, and some descriptive statistics. Higher scores with lower standard deviation indicate a more accurate embedding space.

CHINESE EMBEDDING TYPE	MOST SIMILAR	LEAST SIMILAR	MEDIAN	MEAN	STD
Character	0.9	–0.2	0.66	0.64	0.12
Hybrid1	0.9	0.19	0.66	0.65	0.11
Hybrid2	0.91	0.22	0.66	0.64	0.11
Word	0.92	0.3	0.67	0.67	0.11

A sample of embeddings selected from the cross-lingual Tibetan-Chinese space. This includes a selection of animal, numerical, seasonal, and directional words.

A zoomed in detail of some of the animal words from the cross-lingual embedding space shown in Figure 1, including English translations.

Table 2

Results for all texts with four embedding methods for the Chinese input.

TEXT – CHI. EMBEDDING TYPE	% RANK1	%RANK5	%RANK10	%RANK15	AV. RANK	#ZERO
Text 1 – Character	30.95	69.05	78.57	92.86	4.33	2
Text 1 – Hybrid 1	35.71	69.05	88.1	92.86	3.56	0
Text 1 – Hybrid 2	40.48	73.81	90.48	95.24	3.4	0
Text 1 – Word	38.1	61.9	76.19	85.71	3.92	2
Text 2 – Character	76.19	100	100	100	1.24	0
Text 2 – Hybrid 1	52.38	100	100	100	2	0
Text 2 – Hybrid 2	61.9	100	100	100	1.57	0
Text 2 – Word	42.86	95.24	100	100	2.48	0
Text 3 – Character	35.29	47.06	52.94	70.59	4.58	1
Text 3 – Hybrid 1	35.29	64.71	82.35	88.23	3.53	0
Text 3 – Hybrid 2	35.29	58.82	82.35	82.35	3.36	0
Text 3 – Word	11.76	52.94	70.59	70.59	3.92	2

Top-ranked results for each Chinese embedding method by text.

Crosslinguistic Semantic Textual Similarity of Buddhist Chinese and Classical Tibetan

Figures & Tables

Figure 1

Table 1

Figure 2

Figure 3

Figure 4

Table 2

Figure 5

Paradigm

My account