Have a personal or library account? Click to login
A Telegram Corpus for Hate Speech, Offensive Language, and Online Harm Cover

A Telegram Corpus for Hate Speech, Offensive Language, and Online Harm

Open Access
|Jul 2021

Abstract

We provide a new text corpus from the social medium Telegram, which is rich in indirect forms of divisive speech. We scraped all messages from one channel of Donald Trump supporters, covering a large part of his presidency, from late 2016 until January 2021, including the January 6 Capitol riot. The discussion among the group members, over this long time period, includes the spread of disinformation, disparaging of out-group members, and other forms of harmful speech. To enable research into the role of harmful speech in political discourse, we added two types of annotations to the corpus: (i) automatic annotations of offensive language for all messages, and (ii) our own manual annotations of harmful language for a portion of the posts leading up to the January 2021 Capitol riot and its aftermath.

DOI: https://doi.org/10.5334/johd.32 | Journal eISSN: 2059-481X
Language: English
Published on: Jul 5, 2021
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2021 Veronika Solopova, Tatjana Scheffler, Mihaela Popa-Wyatt, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.