Have a personal or library account? Click to login
Datasets for South African Languages: Bilingual Aligned and Monolingual Data for Machine Translation Cover

Datasets for South African Languages: Bilingual Aligned and Monolingual Data for Machine Translation

Open Access
|Sep 2025

References

  1. 1Abdelzaher, E. (2022). An investigation of Corpus Contributions to Lexicographic Challenges over the Past Ten Years. Lexicos, 32, 162179. 10.5788/32-1-1714
  2. 2Héja, E. (2010). The Role of Parallel Corpora in Bilingual Lexicography. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10) (pp. 27982805). http://www.lrec-conf.org/proceedings/lrec2010/pdf/559_Paper.pdf
  3. 3Hocking, J. (2014). Language identification for South African languages. In Proceedings of the Annual Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech), poster session (p. 307). http://scholar.google.com/scholar_lookup?title=Language+identification+for+South+African+languages&conference=Proceedings+of+the+Annual+Pattern+Recognition+Association+of+South+Africa+and+Robotics+and+Mechatronics+International+Conference+(PRASA-RobMech)&author=Hocking,+J.&publication_year=2014&pages=307
  4. 4Ndhlovu, K. (2016). Using ParaConc to extract bilingual terminology from parallel corpora: A case of English and Ndebele. Literator, 37(2), 112. 10.4102/lit.v37i2.1278
  5. 5Prinsloo, D. J., & de Schryver, G. M. (2002). Towards an 11x11 Array for the Degree of Conjunctivism/Disjunctivism of the South African Languages. Nordic Journal of African Studies, 11(2), 249265. https://njas.fi/njas/article/view/359
  6. 6Puttkammer, M. J. R., Eiselen, R., Hocking, J., & Koen, F. (2018). NLP Web Services for Resource-Scarce Languages. In Proceedings of ACL 2018, System demonstrations (pp. 4349). 10.18653/v1/P18-4008
  7. 7Varga, D., Németh, L., Halácsy, P., Kornai, A., Trón, V., & Nagy, V. (2005). Parallel corpora for medium density languages. In Proceedings of the RANLP 2005 (pp. 590596). http://mokk.bme.hu/en/resources/hunalign
DOI: https://doi.org/10.5334/johd.372 | Journal eISSN: 2059-481X
Language: English
Submitted on: Aug 12, 2025
Accepted on: Sep 8, 2025
Published on: Sep 19, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Tanja Gaustad, Cindy A. McKellar, Martin J. Puttkammer, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.