Table 1
Overview of total token counts for all nine languages included in the dataset.
| LANGUAGE | NUMBER OF TOKENS |
|---|---|
| isiNdebele | 42,335 |
| isiXhosa | 46,465 |
| isiZulu | 45,933 |
| Siswati | 43,568 |
| Sesotho | 73,727 |
| Sesotho sa Leboa/Sepedi | 73,031 |
| Setswana | 72,609 |
| Tshivenḓa | 66,487 |
| Xitsonga | 69,584 |
Table 2
An overview of unique main morphology tags and total number of distinct tags per language.
| NUMBER OF UNIQUE MAIN MORPHOLOGY TAGS (WITHOUT CLASS INFORMATION) | TOTAL NUMBER OF MORPHOLOGY TAGS | |
|---|---|---|
| isiNdebele | 71 | 401 |
| isiXhosa | 77 | 370 |
| isiZulu | 74 | 423 |
| Siswati | 69 | 378 |
| Sesotho | 74 | 292 |
| Sesotho sa Leboa/Sepedi | 65 | 319 |
| Setswana | 63 | 313 |
| Tshivenḓa | 64 | 439 |
| Xitsonga | 67 | 290 |
