Have a personal or library account? Click to login
A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects Cover

A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects

Open Access
|Jul 2023

Abstract

This article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa between 2008 and 2020. It contains texts for eleven official South African languages. PDF versions of the texts stem from South Africa’s Department of Basic Education’s online public access repository. Plain text is extracted from the PDFs and the texts are tokenized. The data set contains 429 full-text files with 929 manually extracted comprehension and summary writing texts. The data is useful for studies investigating, e.g., linguistic properties, text readability, text properties, and linguistic complexity in any of the eleven languages. Furthermore, both intra-language and inter-language comparisons or investigations can be made.

DOI: https://doi.org/10.5334/johd.108 | Journal eISSN: 2059-481X
Language: English
Submitted on: Apr 27, 2023
Accepted on: Jun 9, 2023
Published on: Jul 5, 2023
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2023 Johannes Sibeko, Menno van Zaanen, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.