Have a personal or library account? Click to login
A Reproducible IT-Blog Corpus Cover
Open Access
|Jul 2021

Abstract

The dataset comprises text and metadata extracted from several hundred IT-blogs and websites, along with a method to duplicate the data by updating its contents and downloading it to the user’s local machine. The targets have been hand-picked with the intention to represent the discourse on blogs and websites dedicated to questions at the intersection of technology and society from Germany and the United States of America. The texts have been retrieved by web crawling techniques. The resulting corpus is accessible through a search platform and also reproducible with freely accessible descriptors and software.

DOI: https://doi.org/10.5334/johd.35 | Journal eISSN: 2059-481X
Language: English
Published on: Jul 22, 2021
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2021 Adrien Barbaresi, Jens Pohlmann, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.