Skip to main content
Have a personal or library account? Click to login
Introducing the First Module of the Multimedia Corpus of Spoken Kazakh Language Cover

Introducing the First Module of the Multimedia Corpus of Spoken Kazakh Language

Open Access
|May 2026

Abstract

The first module of the Multimedia Corpus of Spoken Kazakh Language is a dataset documenting contemporary Kazakh as spoken in Kazakhstan and Xinjiang (China). It includes 33 audio recordings (ca. 12 hours) and time-aligned transcriptions collected from 78 participants. Recordings feature naturally occurring conversation among native Kazakh speakers. The corpus is anonymized and published under a CC BY 4.0 license. The dataset is intended as a linguistic resource for the empirical analysis of Kazakh and it is suitable for reuse in a wide-range of linguistics-adjacent disciplines concerned with the analysis of naturally occurring language in use.

DOI: https://doi.org/10.5334/johd.529 | Journal eISSN: 2059-481X
Language: English
Page range: 67 - 67
Submitted on: Feb 26, 2026
Accepted on: Apr 24, 2026
Published on: May 25, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Giorgia Troiani, Andrey Filchenko, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.