Abstract
This dataset provides a linguistically and morphologically annotated sample of 313 Arabic words drawn from a larger corpus of 223,690 words compiled from Sharaye al-Islam, a classical Arabic jurisprudential text. Each token includes segmentation, lemma, part-of-speech, and affix-level annotations that have been manually verified for accuracy. The data are stored in UTF-8 CSV format and openly shared via Zenodo. This resource supports training and benchmarking of Arabic morphological analysis systems and can be used for developing and evaluating AI-based models in Arabic natural language processing.
