Table 1
List of genres, their selection criteria, and the total number of documents per category.
| CODE | GENRE | INSTRUMENTALITY | PLATFORM | SELECTION CRITERIA | # DOCS |
|---|---|---|---|---|---|
| BIO | Biography | Non-fiction | Goodreads | “Best memoir/biography/autobiography” list | 193 |
| BS | Bestseller | Fiction | New York Times | Fiction published since 2001 with the longest aggregate time on the New York Times bestseller list | 249 |
| HIST | History | Non-fiction | Amazon | Books listed under “history” under the “bestsellers” tag | 205 |
| MEM | Memoir | Non-fiction | Amazon | Books listed under “memoir” under the “bestsellers” tag | 229 |
| MID | Middle school | Fiction | Goodreads | Goodreads Choice awards for “Middle Grade” books | 166 |
| MIX | Assorted non-fiction | Non-fiction | Amazon | Books listed under assorted non-fiction tags such as “health”, “politics”, and “business”, under the “bestsellers” tag | 193 |
| MY | Mystery | Fiction | Amazon | Books listed under “Mystery, Thriller, Suspense” under the “bestsellers” tag | 234 |
| NYT | New York Times reviewed | Fiction | New York Times | Fiction reviewed in the New York Times Book Review | 419 |
| PW | Prizelists | Fiction | 5 Prizelists (US, UK, Canada) | Works shortlisted for the National Book Award (US), PEN/Faulkner Award (US), Governor General’s Award (Canada), Giller Prize (Canada), and the Man Booker Prize (UK) | 258 |
| ROM | Romance | Fiction | Amazon | Books listed under “Romance” under the “bestsellers” tag | 208 |
| SF | Science-Fiction | Fiction | Amazon | Books listed under “Science Fiction & Fantasy” under the “bestsellers” tag | 223 |
| YA | Young Adult | Fiction | Goodreads | Goodreads Choice Awards for Young Adult Fiction | 177 |
Table 2
List of 20 features included in our data.
| FEATURE | DESCRIPTION | ANNOTATION TYPE |
|---|---|---|
| Category | Fiction or non-fiction | Manual |
| Genre | Twelve categories | Manual |
| Publication Date | Date of first publication | Manual |
| Author Gender | Perceived authorial gender | Manual |
| POS | Part-of-speech uni- and bigrams | Computational |
| Supersense | Frequency of 41-word supersenses | Computational |
| Word Frequencies | Word frequencies for every book/1,000-word passage | Computational |
| Token Count | Work length measure | Computational |
| Total Characters | Estimated total number of named characters | Computational |
| Protagonist Concentration | Percentage of all character mentions by main character | Computational |
| Avg. Sentence Length | Average length of all sentences per book | Computational |
| Avg. Word Length | Average length of all words per book | Computational |
| Tuldava Score | Reading difficulty measure | Computational |
| Event Count | Estimated number of diegetic events | Computational |
| Goodreads Avg. Rating | Average user rating on Goodreads | Computational |
| Goodreads Total Ratings | Total number of ratings on Goodreads as of June 2022 | Computational |
| Average Speed | Measure of narrative pace | Computational |
| Minimum Speed | Measure of narrative distance | Computational |
| Volume | Measure of topical heterogeneity | Computational |
| Circuitousness | Measure of narrative non-linearity | Computational |

Figure 1
Distribution of publication dates of books in our sample.

Figure 2
Distribution of the average user rating on Goodreads for books in our sample. Only includes books with > 9 ratings.

Figure 3
Distribution of the log-transformed number of ratings on Goodreads for books in our sample. Only includes books with > 9 ratings.

Figure 4
Distribution of author gender by genre.
