Skip to main content
Have a personal or library account? Click to login
The Reusability of Resources in Language-Specific Contexts: The SADiLaR Repository as a Case Study Cover

The Reusability of Resources in Language-Specific Contexts: The SADiLaR Repository as a Case Study

Open Access
|May 2026

References

  1. Adelani, D. I., Neubig, G., Ruder, S., Rijhwani, S., Beukman, M., Palen-Michel, C., Lignos, C., Alabi, J. O., Muhammad, S. H., Nabende, P., Dione, C. M. B., Bukula, A., Mabuya, R., Dossou, B. F. P., Sibanda, B., Buzaaba, H., Mukiibi, J., Kalipe, G., Mbaye, D., … Klakow, D. (2022). MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition. arXiv. 10.48550/arXiv.2210.12391
  2. Brink, N. (2020). A usage-based investigation of Afrikaans-speaking children’s holophrases and communicative intentions. Stellenbosch Papers in Linguistics Plus, 59. 10.5842/59-0-860
  3. De Jong, F. M. G., Maegaard, B., De Smedt, K., Fišer, D., & Van Uytvanck, D. (2018). CLARIN: Towards FAIR and Responsible Data Science Using Language Resources. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (pp. 32593264). https://hdl.handle.net/1874/364776
  4. De Wet, F., Eiselen, R., Schillack, E., & Puttkammer, M. (2023). Investigating the Extent and Usability of Webtext Available in South Africa’s Official Languages. In A. Pillay, E. Jembere & A. J. Gerber (Eds.), Artificial Intelligence Research (Vol. 1976, pp. 120135). Springer Nature Switzerland. 10.1007/978-3-031-49002-6_9
  5. Du Toit, J. S., & Puttkammer, M. J. (2021). Developing Core Technologies for Resource-Scarce Nguni Languages. Information, 12(12), 520. 10.3390/info12120520
  6. Eiselen, R., & Gaustad, T. (2023). Deep learning and low-resource languages: How much data is enough? A case study of three linguistically distinct South African languages. Proceedings of the Fourth Workshop on Resources for African Indigenous Languages (RAIL 2023), 4253. 10.18653/v1/2023.rail-1.6
  7. Gaustad, T., & McKellar, C. A. (2024). Updated Morphologically Annotated Corpora for 9 South African Languages. Journal of Open Humanities Data, 10, 38. 10.5334/johd.211
  8. Gaustad, T., McKellar, C. A., & Puttkammer, M. J. (2025). Datasets for South African Languages: Bilingual Aligned and Monolingual Data for Machine Translation. Journal of Open Humanities Data, 11, 50. 10.5334/johd.372
  9. Gaustad, T., & Puttkammer, M. J. (2022). Linguistically annotated dataset for four official South African languages with a conjunctive orthography: IsiNdebele, isiXhosa, isiZulu, and Siswati. Data in Brief, 41, 107994. 10.1016/j.dib.2022.107994
  10. Kaffee, L.-A., Biswas, R., Keet, C. M., Vakaj, E. K., & de Melo, G. (2023). Multilingual Knowledge Graphs and Low-Resource Languages: A Review [Application/pdf]. Transactions on Graph Data and Knowledge (TGDK), 1(1), 10:110:19. 10.4230/TGDK.1.1.10
  11. Marivate, V. (2020). Why African natural language processing now? A view from South Africa# AfricaNLP. Leap 4.0: African Perspectives on the Fourth Industrial Revolution, 126151. 10.2307/jj.12406168.11
  12. Marivate, V., Olaleye, K., Mundia, S., Bakainga, A., Netshifhefhe, U., Milanzie, M., Mogale, T. H., Sindane, T., Abdulrasaq, Z., Mokgosi, K., Okorie, C., Van Wyk, N. Z., Morrissey, G., Dunbar, D., Smit, F., Chidi, T., Mabuya, R., Bukula, A., Mlambo, R., … Rananga, S. (2025). Swivuriso: The South African Next Voices multilingual speech dataset. ArXiv. 10.48550/arXiv.2512.02201
  13. Mathiessen, M., & Lenardič, J. (2025). CLARIN Data Citation Guidelines. https://www.clarin.eu/content/clarin-data-citation-guidelines
  14. McKellar, C. A., & Puttkammer, M. J. (2020). Dataset for comparable evaluation of machine translation between 11 South African languages. Data in Brief, 29, Article 105146. 10.1016/j.dib.2020.105146
  15. Meyer, F., Song, H., Chakrabarty, A., Buys, J., Dabre, R., & Tanaka, H. (2024). NGLUEni: Benchmarking and Adapting Pretrained Language Models for Nguni Languages. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 1224712258). European Language Resources Association. 10.63317/2nawh3refbdd
  16. Mlambo, R., & Matfunjwa, M. (2025). Human language technology tools for indigenous South African languages and their potential use, Literator 46(1), a2049. 10.4102/lit.v46i1.2049
  17. Puttkammer, M., Eiselen, R., Hocking, J., & Koen, F. (2018). NLP Web Services for Resource-Scarce Languages. Proceedings of ACL 2018, System Demonstrations, 4349. 10.18653/v1/P18-4008
  18. Rabé, M. (2021). Kodewisseling in Afrikaans-Nederlandse kinders se spraak. Unpublished master’s thesis. North-West University. 10.13140/RG.2.2.13245.59368
  19. Setaka, M., & Trollip, B. (2022). Resource Repositories and linking resources: An exploratory study. Journal of the Digital Humanities Association of Southern Africa (DHASA), 04(02). 10.55492/dhasa.v4i02.4342
  20. Sibeko, J., & Setaka, M. (2023). An overview of Sesotho BLARK content. Journal of the Digital Humanities Association of Southern Africa (DHASA), 4(01). 10.55492/dhasa.v4i01.4440
  21. Sibeko, J., & Van Zaanen, M. (2023). A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects. Journal of Open Humanities Data, 9, 9. 10.5334/johd.108
  22. Sibeko, J., & Van Zaanen, M. (2025). Developing and testing syllabification systems for South African Sesotho. Language Resources and Evaluation, 59(2), 15771592. 10.1007/s10579-024-09770-8
  23. Skosana, N. J., & Mlambo, R. (2021). A brief study of the Autshumato Machine Translation Web Service for South African languages. Literator, 42(1). 10.4102/lit.v42i1.1766
  24. Terblanche, C., Schnoor, T. T., Harty, M., & Tucker, B. V. (2025). The development of synthetic child speech in three South African languages. Augmentative and Alternative Communication, 41(4), 333344. 10.1080/07434618.2024.2374312
  25. Trollip, B. (2023). ’n Gebruiksgebaseerde beskrywing van Afrikaanse prefiksoïede. LitNet Akademies, 20(3), 851892. 10.56273/1995-5928/2023/j20n3g1
  26. Trollip, B., & Strauss, T. (2024). Analysing Afrikaans lexical blends using Levenshtein distances. North-West University. 10.25388/NWU.25052690
  27. Van Erp, M. (2012). Reusing Linguistic Resources: Tasks and Goals for a Linked Data Approach. In C. Chiarcos, S. Nordhoff & S. Hellmann (Eds.), Linked Data in Linguistics. Springer. 10.1007/978-3-642-28249-2_6
  28. Van der Walt, A., Steyn, J., Trusler, A., & Van Zaanen, M. (2023). Challenges and opportunities of digital humanities training in South Africa: Moving beyond the silos. In L. Estill & J. Guiliano (Eds.), Digital humanities workshops: Lessons learned. Taylor & Francis. 10.4324/9781003301097-7
  29. Weber, T. (2021). Citation tracking and versioning for linguistic examples. In CLARIN Annual Conference Proceedings (CLARIN 2021) (pp. 114118).
  30. Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., …, & Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific data, 3(1), 19. 10.1038/sdata.2016.18
DOI: https://doi.org/10.5334/johd.523 | Journal eISSN: 2059-481X
Language: English
Page range: 71 - 71
Submitted on: Feb 27, 2026
Accepted on: May 4, 2026
Published on: May 28, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Benito Trollip, Michelle White, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.