Abstract
Text classification is the task of assigning textual data to predefined categories, playing a crucial role in natural language processing. In recent years, deep learning models have demonstrated superior performance over traditional machine learning approaches in text classification tasks. This paper presents a supervised deep learning approach for hierarchical Arabic text classification. To facilitate this study, we developed WiHArD, a novel hierarchical Arabic text dataset, where each text is systematically labeled according to a structured category hierarchy. We then propose a deep learning model that integrates BERT-based feature extraction with a neural network classifier. BERT encodes textual inputs into dense vector representations, while the neural network learns to accurately classify texts within the hierarchical structure. Our comparative study demonstrates that the proposed BERT-ANN model achieves significant improvements in hierarchical classification performance, outperforming the existing HMATC model. These findings highlight the e ectiveness of deep learning-based approaches in advancing Arabic text classification.
