
Mapping Multiclass-Targeted Hate Speech in Online Discourse: An Open Dataset
Abstract
Online social networks have become central spaces for public discourse, where hostile and discriminatory language toward social groups can cause psychological and social consequences for marginalized communities. Although multiple public hate speech datasets are available, many rely on binary categorization practices that obscure linguistic, cultural, and contextual variation across targeted groups. As a result, minority and less visible forms of hate speech remain insufficiently documented and analyzed. This discussion paper examines methodological limitations in existing hate speech annotation schemes and presents a re-annotation framework applied to the HatEval2019 dataset. The proposed framework introduces target-specific multiclass labels that distinguish subcategories of gender-based, racial, ethnic, religious, and xenophobic hate speech, enabling more fine-grained analysis of online discourse. The annotation process involved multiple independent annotators, systematic reliability assessment, and iterative guideline refinement. The resulting dataset comprises 5,455 annotated texts that differentiate between targeted hate speech, direct insults, and specific target subcategories. Detailed annotation guidelines and documentation are provided, and the dataset is openly available in tabular format. This paper documents interpretive decisions, ethical considerations, and data practices, enabling reuse of the dataset across digital humanities, discourse analysis, media studies, and social justice research. The dataset allows researchers to examine how hate speech, identity, and power relations are constructed in online communication and contributes to more transparent and responsible humanities data infrastructures.
© 2026 Sanaa Kaddoura, Sumaia Al-Kohlani, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.