Abstract
The City-Data.com Corpus provides over 15,000 discussion forum posts scraped from city-data.com--a website that hosts information about cities across the United States. Like the 20 Newsgroups dataset, the City-Data.com Corpus is weakly labeled by forum topics and thread titles and can be used to trial natural language processing techniques or be used to stage lessons in digital textual analysis in digital humanities pedagogy.
