Have a personal or library account? Click to login
The CONLIT Dataset of Contemporary Literature Cover

The CONLIT Dataset of Contemporary Literature

By: Andrew Piper  
Open Access
|Oct 2022

Figures & Tables

Table 1

List of genres, their selection criteria, and the total number of documents per category.

CODEGENREINSTRUMENTALITYPLATFORMSELECTION CRITERIA# DOCS
BIOBiographyNon-fictionGoodreads“Best memoir/biography/autobiography” list193
BSBestsellerFictionNew York TimesFiction published since 2001 with the longest aggregate time on the New York Times bestseller list249
HISTHistoryNon-fictionAmazonBooks listed under “history” under the “bestsellers” tag205
MEMMemoirNon-fictionAmazonBooks listed under “memoir” under the “bestsellers” tag229
MIDMiddle schoolFictionGoodreadsGoodreads Choice awards for “Middle Grade” books166
MIXAssorted non-fictionNon-fictionAmazonBooks listed under assorted non-fiction tags such as “health”, “politics”, and “business”, under the “bestsellers” tag193
MYMysteryFictionAmazonBooks listed under “Mystery, Thriller, Suspense” under the “bestsellers” tag234
NYTNew York Times reviewedFictionNew York TimesFiction reviewed in the New York Times Book Review419
PWPrizelistsFiction5 Prizelists (US, UK, Canada)Works shortlisted for the National Book Award (US), PEN/Faulkner Award (US), Governor General’s Award (Canada), Giller Prize (Canada), and the Man Booker Prize (UK)258
ROMRomanceFictionAmazonBooks listed under “Romance” under the “bestsellers” tag208
SFScience-FictionFictionAmazonBooks listed under “Science Fiction & Fantasy” under the “bestsellers” tag223
YAYoung AdultFictionGoodreadsGoodreads Choice Awards for Young Adult Fiction177
Table 2

List of 20 features included in our data.

FEATUREDESCRIPTIONANNOTATION TYPE
CategoryFiction or non-fictionManual
GenreTwelve categoriesManual
Publication DateDate of first publicationManual
Author GenderPerceived authorial genderManual
POSPart-of-speech uni- and bigramsComputational
SupersenseFrequency of 41-word supersensesComputational
Word FrequenciesWord frequencies for every book/1,000-word passageComputational
Token CountWork length measureComputational
Total CharactersEstimated total number of named charactersComputational
Protagonist ConcentrationPercentage of all character mentions by main characterComputational
Avg. Sentence LengthAverage length of all sentences per bookComputational
Avg. Word LengthAverage length of all words per bookComputational
Tuldava ScoreReading difficulty measureComputational
Event CountEstimated number of diegetic eventsComputational
Goodreads Avg. RatingAverage user rating on GoodreadsComputational
Goodreads Total RatingsTotal number of ratings on Goodreads as of June 2022Computational
Average SpeedMeasure of narrative paceComputational
Minimum SpeedMeasure of narrative distanceComputational
VolumeMeasure of topical heterogeneityComputational
CircuitousnessMeasure of narrative non-linearityComputational
johd-8-88-g1.png
Figure 1

Distribution of publication dates of books in our sample.

johd-8-88-g2.png
Figure 2

Distribution of the average user rating on Goodreads for books in our sample. Only includes books with > 9 ratings.

johd-8-88-g3.png
Figure 3

Distribution of the log-transformed number of ratings on Goodreads for books in our sample. Only includes books with > 9 ratings.

johd-8-88-g4.png
Figure 4

Distribution of author gender by genre.

DOI: https://doi.org/10.5334/johd.88 | Journal eISSN: 2059-481X
Language: English
Published on: Oct 11, 2022
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2022 Andrew Piper, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.