Abstract
Arabic’s rich and heterogeneous morphology continues to challenge computational analysis, particularly when models trained on Modern Standard Arabic are applied to Classical and Scriptural domains. This discussion paper presents a tri-domain evaluation framework for assessing the domain sensitivity of three widely used morphological analyzers—Farasa, CAMeL, and ALP—across the NAFIS (MSA), Quranic, and Noor–Ghateh (Hadith/Jurisprudential) corpora. Using a unified normalization and segmentation-alignment pipeline, together with bootstrap confidence intervals and paired non-parametric significance tests, the study provides a statistically robust characterization of system performance across domains with markedly different sizes and linguistic profiles. The results show that, while overall accuracy can be higher on classical and scriptural text, all analyzers exhibit systematic weaknesses when confronted with classical lexical forms, dense clitic constructions, and archaic morphological patterns, especially at the stem and suffix levels. By outlining methodological, linguistic, and practical implications of these findings, the paper demonstrates how transparent, multi-domain benchmarking can expose structural limitations in Arabic morphological modeling and guide the development of more adaptable language technologies.
