Mapping Multiclass-Targeted Hate Speech in Online Discourse: An Open Dataset

Sanaa Kaddoura; Sumaia Al-Kohlani

doi:10.5334/johd.521

Mapping Multiclass-Targeted Hate Speech in Online Discourse: An Open Dataset

Journal of Open Humanities Data

Volume 12 (2026): Issue 1

By: Sanaa Kaddoura and Sumaia Al-Kohlani

Open Access

|Apr 2026

Abstract

Online social networks have become central spaces for public discourse, where hostile and discriminatory language toward social groups can cause psychological and social consequences for marginalized communities. Although multiple public hate speech datasets are available, many rely on binary categorization practices that obscure linguistic, cultural, and contextual variation across targeted groups. As a result, minority and less visible forms of hate speech remain insufficiently documented and analyzed. This discussion paper examines methodological limitations in existing hate speech annotation schemes and presents a re-annotation framework applied to the HatEval2019 dataset. The proposed framework introduces target-specific multiclass labels that distinguish subcategories of gender-based, racial, ethnic, religious, and xenophobic hate speech, enabling more fine-grained analysis of online discourse. The annotation process involved multiple independent annotators, systematic reliability assessment, and iterative guideline refinement. The resulting dataset comprises 5,455 annotated texts that differentiate between targeted hate speech, direct insults, and specific target subcategories. Detailed annotation guidelines and documentation are provided, and the dataset is openly available in tabular format. This paper documents interpretive decisions, ethical considerations, and data practices, enabling reuse of the dataset across digital humanities, discourse analysis, media studies, and social justice research. The dataset allows researchers to examine how hate speech, identity, and power relations are constructed in online communication and contributes to more transparent and responsible humanities data infrastructures.

References

Alkomah, F., & Ma, X. (2022). A Literature Review of Textual Hate Speech Detection Methods and Datasets. Information, 13(6), 273. 10.3390/info13060273
Open DOI Search in Google Scholar Back to article
Bajt, V. (2025). The Sociology of Hate Speech. ANNALES, SERIES HISTORIA ET SOCIOLOGIA, 35(4), 397–410. 10.19233/ASHS.2025.26
Open DOI Search in Google Scholar Back to article
Basile, V., Bosco, C., Fersini, E., Nozza, D., Patti, V., Rangel Pardo, F. M., Rosso, P., & Sanguinetti, M. (2019). Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter. In Proceedings of the 13th international workshop on semantic evaluation (pp. 54–63). 10.18653/v1/S19-2007
Open DOI Search in Google Scholar Back to article
Bäumler, J., Blöcher, L., Frey, L. J., Chen, X., Bayer, M., & Reuter, C. (2025). A Survey of Machine Learning Models and Datasets for the Multi-label Classification of Textual Hate Speech in English. arXiv preprint arXiv:2504.08609.
Search in Google Scholar Back to article
Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017, May). Automated Hate Speech Detection and the Problem of Offensive Language. In Proceedings of the international AAAI conference on web and social media (Vol. 11, No. 1, pp. 512–515). 10.1609/icwsm.v11i1.14955
Open DOI Search in Google Scholar Back to article
Kaddoura, S., & Nassar, R. (2025). Language model-based approach for multiclass cyberbullying detection. In M. Barhamgi, H. Wang, & X. Wang (Eds.), Web Information Systems Engineering – WISE 2024. WISE 2024. Lecture Notes in Computer Science (Vol 1543, pp. 78–89). Springer. 10.1007/978-981-96-0567-5_7
Open DOI Search in Google Scholar Back to article
Krippendorff, K. (2022). The Reliability of Generating data. Chapman and Hall/CRC. 10.1201/9781003112020
Open DOI Search in Google Scholar Back to article
Landis, J. R., & Koch, G. G. (1977). The Measurement of Observer Agreement for Categorical Data. Biometrics, 33(1), 159–174. 10.2307/2529310
Open DOI Search in Google Scholar Back to article
Lantz, B., & Faulkner, L. (2025). Female Hate Crime Offenders: The Theoretical and Policy Implications of an Under-Researched Phenomenon. In Hate Crime Perpetrators: New Perspectives from Theory, Research and Practice (Vol. 1, pp. 189–209). Springer Nature Switzerland. 10.1007/978-3-031-92666-2_9
Open DOI Search in Google Scholar Back to article
Lee, J., Lim, T., Lee, H., Jo, B., Kim, Y., Yoon, H., & Han, S. C. (2022, October). K-MHaS: A Multi-label Hate Speech Detection Dataset in Korean Online News Comment. In Proceedings of the 29th International Conference on Computational Linguistics (pp. 3530–3538). International Committee on Computational Linguistics.
Search in Google Scholar Back to article
Madriaza, P., Hassan, G., Brouillette–Alarie, S., Mounchingam, A. N., Durocher–Corfa, L., Borokhovski, E., Pickup, D., & Paillé, S. (2025). Exposure to hate in online and traditional media: A systematic review and meta–analysis of the impact of this exposure on individuals and communities. Campbell systematic reviews, 21(1), e70018. 10.1002/cl2.70018
Open DOI Search in Google Scholar Back to article
Mathew, B., Saha, P., Yimam, S. M., Biemann, C., Goyal, P., & Mukherjee, A. (2021, May). HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, No. 17, pp. 14867–14875). 10.1609/aaai.v35i17.17745
Open DOI Search in Google Scholar Back to article
Mody, D., Huang, Y., & De Oliveira, T. E. A. (2023). A curated dataset for hate speech detection on social media text. Data in Brief, 46, 108832. 10.1016/j.dib.2022.108832
Open DOI Search in Google Scholar Back to article
Mollas, I., Chrysopoulou, Z., Karlos, S., & Tsoumakas, G. (2022). ETHOS: a multi-label hate speech detection dataset. Complex & Intelligent Systems, 8, 4663–4678. 10.1007/s40747-021-00608-2
Open DOI Search in Google Scholar Back to article
Mubeen, M., Muskan, A., Akram, A., Rashid, J., Alshalali, T. A. N., & Sarwar, N. (2025). Cyberbullying-Related Automated Hate Speech Detection on Social Media Platforms Using Stack Ensemble Classification Method. International Journal of Computational Intelligence Systems, 18, 174. 10.1007/s44196-025-00919-z
Open DOI Search in Google Scholar Back to article
Papcunová, J., Martončik, M., Fedáková, D., Kentoš, M., Bozogáňová, M., Srba, I., et al. (2023). Hate speech operationalization: a preliminary examination of hate speech indicators and their structure. Complex & intelligent systems, 9, 2827–2842. 10.1007/s40747-021-00561-0
Open DOI Search in Google Scholar Back to article
Scheffler, T., Solopova, V., & Popa-Wyatt, M. (2021). The Telegram Chronicles of Online Harm. Journal of Open Humanities Data, 7. 10.5334/johd.31
Open DOI Search in Google Scholar Back to article
Walsh, S., & Greaney, P. (2025). Multiclass hate speech detection with an aggregated dataset. Natural Language Processing, 31(6), 1350–1366. 10.1017/nlp.2024.62
Open DOI Search in Google Scholar Back to article
Warner, W., & Hirschberg, J. (2012, June). Detecting Hate Speech on the World Wide Web. In S. O. Sood, M. Nagarajan, & M. Gamon (Eds.), Proceedings of the Second Workshop on Language in Social Media (pp. 19–26). Association for Computational Linguistics.
Search in Google Scholar Back to article
Waseem, Z., & Hovy, D. (2016, June). Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In J. Andreas, E. Choi, & A. Lazaridou (Eds.), Proceedings of the NAACL student research workshop (pp. 88–93). Association for Computational Linguistics. 10.18653/v1/N16-2013
Open DOI Search in Google Scholar Back to article
Yu, Z., Sen, I., Assenmacher, D., Samory, M., Fröhling, L., Dahn, C., Nozza, D., & Wagner, C. (2025). The Unseen Targets of Hate: A Systematic Review of Hateful Communication Datasets. Social Science Computer Review, 43(5), 1114–1144. 10.1177/08944393241258771
Open DOI Search in Google Scholar Back to article