Skip to main content
Have a personal or library account? Click to login
Introducing the First Module of the Multimedia Corpus of Spoken Kazakh Language Cover

Introducing the First Module of the Multimedia Corpus of Spoken Kazakh Language

Open Access
|May 2026

References

  1. Akanov, A. (2025). Russian ‘vot’ as an interactional practice in bilingual kazakh conversations. International Journal of Bilingualism (Special issue: Language Convergence and Diversity in the Post‑Soviet Multilingual Diaspora Across the World), 125. 10.1177/13670069251396260
  2. Arkhipov, A., & Däbritz, C. L. (2018). Hamburg corpora for indigenous northern eurasian languages. Томский журнал лингвистических и антропологических исследований, 3, 918. 10.23951/2307-6119-2018-3-9-18
  3. Bahry, S. A. (2016). Language ecology: Understanding central asian multilingualism. In E. S. Ahn & J. Smagulova (Eds.), Language change in central asia (pp. 1132). De Gruyter. 10.1515/9781614514534-006
  4. Chafe, W. L. (1994). Discourse, consciousness, and time: the flow and displacement of conscious experience in speaking and writing. University of Chicago Press.
  5. Chernyavskaya, V. E., & Zharkynbekova, S. K. (2024). Code switching patterns in kazakh-russian hybrid language practice: An empirical study. Training, Language and Culture, 8(2), 919. 10.22363/2521-442X-2024-8-2-9-19
  6. Chui, K., & Lai, H.-L. (2008). The NCCU corpus of spoken chinese: Mandarin, hakka, and southern min. Taiwan Journal of Linguistics, 6(2), 119144.
  7. Dobrushina, N., & Moroz, G. (2021). The speakers of minority languages are more multilingual. International Journal of Bilingualism, 25(4), 921938. 10.1177/13670069211023150
  8. Du Bois, J. W., Chafe, W., Meyers, C., & Thompson, S. A. (2000). Santa barbara corpus of spoken american english. Linguistic Data Consortium. 10.35111/S2Q7-GQ73
  9. Du Bois, J. W., Schuetze-Coburn, S., Cumming, S., & Paolino, D. (1993). Outline of discourse transcription. In J. A. Edwards & M. D. Lampert (Eds.), Talking data: Transcription and coding in discourse research (pp. 4589). Lawrence Erlbaum Associates Publishers.
  10. Himmelmann, N. P., Sandler, M., Strunk, J., & Unterladstetter, V. (2018). On the universality of intonational phrases: a cross-linguistic interrater study. Phonology, 35(2), 207245. 10.1017/S0952675718000039
  11. Khassanov, Y., Mussakhojayeva, S., Mirzakhmetov, A., Adiyev, A., Nurpeiissov, M., & Varol, H. A. (2021). A crowdsourced open-source kazakh speech corpus and initial speech recognition baseline. In P. Merlo, J. Tiedemann, & R. Tsarfaty (Eds.), Proceedings of the 16th conference of the european chapter of the association for computational linguistics: Main volume (pp. 697706). Association for Computational Linguistics. 10.18653/v1/2021.eacl-main.58
  12. Koptleuova, K., Karagulova, B., Zhumakhanova, A., Kondybay, K., & Salikhova, A. (2023). Multilingualism and the current language situation in the republic of kazakhstan. International Journal of Society, Culture and Language, 11(3), 242257. 10.22034/ijscl.2023.2007080.3099
  13. Madiyeva, G., Michael, D., Arkhangelsky, T., Toldova, S., Lyashevskaya, O., Umatova, Z., …, Alisheva, Z. (2016). Almaty corpus of kazakh language. Retrieved from https://web-corpora.net/KazakhCorpus/search/?interface_language=en
  14. Makhambetov, O., Makazhanov, A., Yessenbayev, Z., Matkarimov, B., Sabyrgaliyev, I., & Sharafudinov, A. (2013). Assembling the kazakh language corpus. EMNLP 2013–2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 10221031). 10.18653/v1/D13-1104
  15. Mauri, C., Ballarè, S., Goria, E., Cerruti, M., & Suriano, F. (2019). KIParla corpus: A new resource for spoken italian. In R. Bernardi, R. Navigli, & G. Semeraro (Eds.), Proceedings of the sixth italian conference on computational linguistics (CLiC-it 2019) (pp. 243249). CEUR Workshop Proceedings.
  16. Troiani, G. (2023). Representing a language in use: corpus construction, prosody, and grammar in kazakh (phdthesis).
  17. Troiani, G., Du Bois, J. W., & Filchenko, A. (2024). Corpus as a slice of life: Representing naturally occurring language and its speakers. Research in Corpus Linguistics, 12(2), 174202. 10.32714/ricl.12.02.08
  18. Troiani, G., & Mukanova, K. (in press). Conversational functions of russian‑borrowed ‘like’‑quotatives tipa and takoj in kazakh spoken discourse. International Journal of Bilingualism, (Special issue: Language Convergence and Diversity in the Post‑Soviet Multilingual Diaspora Across the World).
  19. Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., & Sloetjes, H. (2006). ELAN: a professional framework for multimodality research. In N. Calzolari, et al. (Eds.), Proceedings of the fifth international conference on language resources and evaluation (LREC‘06). European Language Resources Association (ELRA). 10.63317/5pwa5zpssv4z
  20. Zhanabekova, A. A. (2012). Ұлттық Корпус Дегеніміз Не? [what is the national corpus?]. In Proceedings of the international scientific-practical conference (Vol. 1, pp. 5761).
DOI: https://doi.org/10.5334/johd.529 | Journal eISSN: 2059-481X
Language: English
Page range: 67 - 67
Submitted on: Feb 26, 2026
Accepted on: Apr 24, 2026
Published on: May 25, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Giorgia Troiani, Andrey Filchenko, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.