| (1) | a. | ka-se | |
| 1sg-go | |||
| ‘I am going’ | |||
| b. | se-mak-ung | ||
| go-neg-1sg | |||
| ‘I am not going’ | Ranglong | ||
| (2) | a. | kaː-tà-tì-nʉ́ | |
| 1:p-touch-2-nfut | |||
| ‘you (SG) touch me’ | Anal Naga | ||
| b. | m̩̀-m̥ú-náː-tʃɘ̀ | ||
| inv-see-ipfv:tr-2 | |||
| ‘you (SG) saw me’ | Monsang | ||
| (3) | a. | a-t-déé | |
| 2-inv-see | |||
| ‘you (SG) see me’ | |||
| b. | m-t-déé | ||
| 1-inv-see | |||
| ‘you (SG) see me’ | Lamkang | ||
Table 1
Languages included in the first release of PMST, with identifiers, group affiliation, collaborators and sources. Languages are ordered by group. Within Northwestern, languages are ordered by how closely related they are assumed to be following the impressionistic subgroupings in Konnerth (2022).
| LANGUAGE | GLOTTOCODE | ISOCODE1 | GROUP | COLLABORATORS & SOURCES |
|---|---|---|---|---|
| Ranglong | rang1271 | (rnl) | Northwestern | Hunter Brown, Jessi Tara |
| Chiru | chir1283 | cdf | Northwestern | Mechek Sampar Awan; Awan (2019) |
| Anal Naga | anal1239 | anm | Northwestern | Pavel Ozerov; Thotson Langhu; Ozerov (2019) |
| Monsang | mons1234 | nmh | Northwestern | Linda Konnerth, Koninglee Wanglar |
| Lamkang | lamk1238 | lmk | Northwestern | Shobhana Chelliah, Rex Rengpu Khullar; Chelliah et al. (2019) |
| Hmar | hmar1241 | hmr | Northwestern | Marina Infimate |
| Pangkhua | pank1249 | pkh | Central | Mohammed Zahid Akter; Akter (2024) |
| Hyow | khya1239 | (csh) | Southern | Muhammad Zakaria; Zakaria (in press) |

Figure 1
Location and group affiliation of the sample languages. The inset shows the location of the detailed map within South(east) Asia.
Table 2
Overview of tags used to annotate variation.
| TAG | CATEGORY OF TAG | DESCRIPTION |
|---|---|---|
| default | paradigm_tag | unmarked form (most general, most frequent, etc.) or form that has no other tag |
| pragm_marked | paradigm_tag | pragmatically conditioned variant |
| hort | paradigm_tag | form is a hortative |
| emph | paradigm_tag | form is from an emphatic paradigm |
| unspec_var | paradigm_tag | variant of (yet) unspecified distribution |
| generic_nf | tense_tag | generic non-future form |
| non_generic_nf | tense_tag | non-generic non-future form |
| past | tense_tag | past tense form |
| optional_plural | overabundance_tag | form that does not contain a marker for plural |
| optional_third | overabundance_tag | form that does not contain a marker for third person |
| optional_future | overabundance_tag | form that does not contain a marker for future tense |
| variable_order | order_tag | form contains morphemes that can variably order |
| special_stem | morphanalysis_tag | form has a special stem form in particular cells of a paradigm |
| tone_alt_stem | morphanalysis_tag | form exhibits a tone alternation triggered by the stem |
| morphophon | morphanalysis_tag | form exhibits morphophonological process(es) |
| copy_v | phonanalysis_tag | form has a copy vowel in at least one morpheme |
| dialect_var | variants_tag | form from other dialect |
| sociolect_var | variants_tag | form from other sociolect |

Figure 2
Schematic overview of workflow and the connection between the working versions of the datasets on GitHub and the published versions on Zenodo. A, B, C represent individual languages.
Table 3
Dataset description.
| Repository name | Zenodo |
|---|---|
| Object name | PMST-Database |
| Repository location | All PMST datasets can be found at https://zenodo.org/communities/pmst/. For DOIs of individual datasets, please consult Table 4. |
| Format names | csv, json, md, yml |
| Creation dates | 2023-12-27 to 2025-12-10 |
| Publication date | The datasets pertaining to the first release of PMST were published between 2025-12-01 to 2025-12-10. |
| License | CC-BY-SA 4.0 |
Table 4
Languages (=datasets) included in the first release of PMST, with the number of forms, the number of scenarios,6 and their DOI.
| LANGUAGE | FORMS | SCENARIOS | ZENODODOI |
|---|---|---|---|
| Anal Naga | 311 | 184 | 10.5281/zenodo.17881855 |
| Chiru | 674 | 165 | 10.5281/zenodo.17779437 |
| Hmar | 267 | 163 | 10.5281/zenodo.17779055 |
| Hyow | 917 | 352 | 10.5281/zenodo.17788529 |
| Lamkang | 298 | 158 | 10.5281/zenodo.17780049 |
| Monsang | 336 | 163 | 10.5281/zenodo.17865713 |
| Pangkhua | 149 | 145 | 10.5281/zenodo.17866617 |
| Ranglong | 255 | 142 | 10.5281/zenodo.17778036 |

Figure 3
Overview of database modules and their relations. Two-way arrows indicate direct links between files, e.g., the forms file can be joined with the cells file via the cell/cell identifier which appears in both files. One-way arrows indicate subset relations, e.g., each phoneme in the phon_form columns appears in the sound file separately.

Figure 4
Distributional profile of morphs in the Ranglong dataset. The top panel shows the distribution across tense-aspect and polarity values. The middle panel shows the distribution across person configurations. The bottom panel shows the distribution across number categories. Stripes are used for elements appearing before the verb stem and circles for those appearing after.

Figure 5
Length of verb forms (minus the lexical stem) in phonemes of transitive affirmative scenarios aggregated per scenario and language. The dot indicates the average; the whiskers show the range. Languages are arranged by subgroup and relatedness (cf. Table 1).
| 1 | first person |
| 2 | second person |
| 3 | third person |
| A | A (actor) argument of a transitive predicate |
| INV | inverse |
| IPFV | imperfective |
| NEG | negation |
| NFUT | non-future |
| P | P (undergoer) argument of a transitive predicate |
| S | sole argument of an intransitive predicate |
| SAP | speech-act participant (first and second person) |
| SC | South-Central (branch of Trans-Himalayan) |
| SG | singular |
| TR | transitive |
