(1) Overview
Repository location
Open Science Framework. “Absorption in Online Book Reviews”, https://osf.io/kr4v6/
(German metadata corpus: https://osf.io/kwru5/; German annotation guidelines: https://osf.io/v6rca/)
Context
The main aim of this project was to translate, test, and adapt the English-language AbsORB annotation guidelines (Kuijpers et al., 2023a, 2023b) for working with German-language online book reviews. Originally, the English-language annotation guidelines were developed for the purposes of validating the Story World Absorption Scale (SWAS, Kuijpers, Hakemulder, Tan & Doicaru, 2014), which is a self-report instrument to capture experiences of absorption during reading, by comparing the statements on the instrument to the unprompted expressions people use in online book reviews to describe their absorbing reading experiences (cf. Kuijpers, Lusetti, Lendvai & Rebora, 2024). Absorption is a reading experience, during which the reader is focused on the narrative, emotionally engaged with its characters and mentally transported to the story world (cf. Kuijpers et al., 2014). It is an experience that has been empirically shown to be a predictor of both reading enjoyment and aesthetic appreciation (e.g., Green, Brock & Kaufman, 2004; Kuijpers, 2014).
By translating, testing and adapting the annotation guidelines to German and annotating a corpus of German-language book reviews scraped from the Goodreads website, we aimed to investigate the prevalence of the absorption metaphor in different languages and compare how it is used across these languages, and to contribute to the growing field of research on Digital Social Reading (cf. Pianzola, 2021).
(2) Method
Building the corpus
In the first phase of the data collection, at the end of 2022, we scraped approximately 96,000 German-language reviews of seven different genres (fantasy, romance, thriller, science fiction, historical fiction, horror, and mystery) from the Goodreads website. In a second phase (beginning of 2023), we scraped 3,000 reviews from the genre “Climate Fiction” for a side-project (cf. Loi, Lusetti & Kuijpers, 2025) and 8,000 reviews of books originally written in German by Swiss, German and Austrian authors (to see if we could find salient differences between reviews written about original German works and translated works).
From the total amount of scraped reviews, we wanted to select the most interesting in terms of absorption, i.e., those with a high number of potential absorption indicators, to be annotated. A simple automated selection method was developed for this purpose. Unlike in the English language AbsORB project (Kuijpers et al., 2023a), in which a machine learning method was used (cf. Lendvai et al., 2020; Rebora, Kuijpers & Lendvai, 2020), this method was based on counting the number of times certain keywords, that were identified as being good absorption indicators during an initial annotation practice phase, appeared per review. For each book in our corpus, the five reviews with the highest keyword score were selected for the annotation phase. Despite its simplicity, this method proved to be highly effective. More specifically, the selection method consisted of the following steps: 1) Reviews that contained links were removed; 2) Reviews that were too short or too long were removed (the delimiting thresholds changed over the different phases of the project); 3) Reviews that contained images were removed, as the image would not be visible in our annotation tool, but might be referred to in the review text, creating ambiguity; 4) Each remaining review received a score based on the number of keyword matches; 5) For each book, only the five highest-scoring reviews were retained for the annotation phase (since the goal was to annotate five reviews for each selected book, books with fewer than five reviews were discarded); 6) In order to match the number of reviews in our English-language corpus, only 494 of our 695 annotated German reviews were curated and included in the metadata corpus. Of course, these criteria bias our sample, but since our project was concerned with investigating how people express themselves about absorption (rather than how often they talk about absorption), we needed to preselect absorption-rich reviews. In the final corpus, there are reviews on 99 different books, from 85 different authors. Our original English corpus included reviews on 98 different books, from 88 different authors; only 4 books appeared in both corpora. As the number of German language reviews published on Goodreads is substantially smaller than the number of English reviews, we weren’t able to make the corpora comparable. However, because we were interested in the expression of absorption and how that potentially differed between the two languages, this did not hinder our project’s aims.
Similarly to the English-language AbsORB project, we decided to prepare two versions of our corpus: one with metadata, but without the full review texts (this version is available under the OSF-link presented in this paper), and one which does include the anonymised full-text reviews, which can be requested from the authors. We made this decision based on the sensitive nature of the data with respect to both copyright and privacy law (APA, 2023).
Developing the annotation guidelines
The annotation guidelines were developed throughout the annotation process, which started in December 2022 and was divided into three practice rounds and seven actual rounds, the last of which was completed in November 2023. For a thorough description of the annotation process, we refer readers to our paper on the annotation process in the English-language AbsORB project (Kuijpers et al., 2024).
For the project presented here, the annotation categories that were developed over the course of the English-language AbsORB project were taken as point of departure for the German annotation guidelines. After the initial translation of all AbsORB statements and categories, we continuously adjusted the German statements throughout the annotation process based on the language used in German-language reviews. We added example statements from reviews to the guidelines wherever possible, as well as comments for each statement to help other researchers decide whether a sentence belongs to a certain category or not. These comments are in some cases similar to those in the English-language guidelines, suggesting that the experience of absorption is rather similar or at least similarly discussed in both languages. For example, for most of the negated Transportation categories, we could not find any example from the reviews. This may indicate that regardless of the language used, reviewers do not seem to talk about “not feeling like I was in the story world”, they only talk about Transportation when it happened. This is interesting as for other aspects of absorption, such as Attention or Impact, people do write about “not being able to concentrate” or their engagement “not being effortless”, indicating, perhaps, that these aspects are more fundamental to the experience of absorption. However, in other cases the comments in the annotation guidelines are particular to the German language of the corpus. For example, we added explications on the difference between the Empathy and Sympathy categories, since very similar language is used for these in German. The concept of sympathy is generally rendered as “Mitgefühl”, whereas the related verb “mitfühlen” is commonly used to describe empathy, but not sympathy.
Table 1 shows all of the annotation categories and how many times they were used to annotate segments of reviews in both the English and the German corpus. We decided to include the frequency values for the English corpus (and percentages) in order to facilitate comparisons in terms of cultural or linguistic differences between English-language reviews and German-language reviews. For example, Emotional Engagement seems important for readers’ feelings of absorption, in both the English and the German data. However, while it seems particularly important for German-language users to establish an emotional connection to the character (115 instances of EE3 versus only 69 instances in the English corpus), for English-language users their relationship to the characters seems to extend beyond the book, manifesting as imagined or desired parasocial relationships (79 instances of EE11 versus only 22 instances in the German corpus). Additionally, (the desire for) rereading books was vastly more common in English reviews than in German reviews (112 instances of IM2 versus 26 instances, respectively), whereas the feeling of Effortless Engagement was more common in German reviews than English reviews (100 instances of IM1 versus 68 instances, respectively).
Table 1
Annotation layers and categories with number of annotations per category (presence or negation of the category) and proportions (in percentages) of the overall number of annotations of each category per review: the English and the German AbsORB corpora. Note. 12.15% of English reviews did not include any annotations and 3.64% of German reviews did not include any annotations.
| STATEMENT | ENGLISH ABSORPTION PRESENT | GERMAN ABSORPTION PRESENT | ENGLISH ABSORPTION NEGATED | GERMAN ABSORPTION NEGATED | |
|---|---|---|---|---|---|
| Attention | A1 While reading time moved differently | 3 (0.61%) | 6 (1.21%) | 0 (0%) | 0 (0%) |
| A2 My attention was focused on the book | 10 (2.02%) | 4 (0.61%) | 2 (0.2%) | 1 (0.2%) | |
| A3 I was absorbed by the book | 186 (29.55%) | 282 (43.32%) | 8 (1.01%) | 33 (6.07%) | |
| A4 I was not distracted while reading | 3 (0.4%) | 0 (0%) | 2 (0.4%) | 10 (0%) | |
| A5 While reading I forgot the world around me | 16 (3.04%) | 2 (0.4%) | 0 (0%) | 1 (0.2%) | |
| A6 I wanted to know what would happen next | 108 (18.42%) | 152 (25.3%) | 3 (0.61%) | 24 (4.25%) | |
| A7 I did not want to put the book down | 145 (21.86%) | 136 (24.7%) | 5 (0.81%) | 23 (3.85%) | |
| Emotional Engagement | EE1 I could imagine what it must be like to be this character | 35 (5.47%) | 45 (8.5%) | 0 (0%) | 13 (2.63%) |
| EE2 I sympathized with this character | 53 (8.91%) | 23 (4.66%) | 4 (0.81%) | 4 (0.81%) | |
| EE3 I felt a connection to this character | 69 (10.32%) | 115 (19.84%) | 10 (1.82%) | 15 (2.83%) | |
| EE4 I felt how this character was feeling | 72 (10.73%) | 67 (13.16%) | 1 (0.2%) | 3 (0.61%) | |
| EE5 I felt for what happened in the story | 79 (11.34%) | 84 (14.17%) | 0 (0%) | 4 (0.61%) | |
| EE6 I felt angry at the character | 19 (3.85%) | 11 (2.23%) | 1 (0.2%) | 0 (0%) | |
| EE7 I felt scared for the character | 5 (0.81%) | 6 (1.21%) | 0 (0%) | 0 (0%) | |
| EE8 I felt like I knew this character | 11 (2.23%) | 22 (4.45%) | 0 (0%) | 1 (0.2%) | |
| EE9 I wish I could be more like this character | 8 (1.01%) | 5 (1.01%) | 0 (0%) | 0 (0%) | |
| EE10 I understood why this character did this | 26 (4.86%) | 48 (8.5%) | 5 (1.01%) | 22 (4.25%) | |
| EE11 I want to have some kind of relationship with this character | 79 (11.94%) | 22 (3.64%) | 0 (0%) | 0 (0%) | |
| EE12 I wanted to involve myself in the story events | 42 (6.88%) | 10 (2.02%) | 0 (0%) | 0 (0%) | |
| Mental Imagery | MS1 I could imagine what the characters looked like | 15 (2.63%) | 7 (1.42%) | 3 (0.2%) | 0 (0%) |
| MS2 I could see the story events clearly in my mind | 20 (4.05%) | 8 (1.62%) | 0 (0%) | 1 (0.2%) | |
| MS3 I could imagine what the story world looked like | 25 (4.45%) | 51 (9.92%) | 2 (0.2%) | 0 (0%) | |
| MS4 The character/story world felt real to me | 73 (12.75%) | 61 (11.54%) | 0 (0%) | 8 (1.62%) | |
| Transportation | T1 While reading this I was in the story world | 13 (2.63%) | 18 (3.24%) | 0 (0%) | 0 (0%) |
| T2 Elements from the story world came into my world | 14 (2.43%) | 8 (1.42%) | 0 (0%) | 0 (0%) | |
| T3 The story world felt close to me | 4 (0.4%) | 15 (2.83%) | 0 (0%) | 0 (0%) | |
| T4 I felt transported to the story world | 19 (3.64%) | 54 (9.11%) | 0 (0%) | 0 (0%) | |
| T5 I felt part of the story world | 34 (5.26%) | 23 (4.45%) | 0 (0%) | 1 (0.2%) | |
| T6 I returned from a trip to the story world | 3 (0.4%) | 1 (0.2%) | 0 (0%) | 0 (0%) | |
| T7 I traveled with the characters through the story world | 26 (5.06%) | 30 (5.67%) | 0 (0%) | 0 (0%) | |
| Impact | IM1 It was an easy read/I devoured this book | 68 (11.94%) | 100 (18.02%) | 40 (6.88%) | 79 (13.16%) |
| IM2 I will reread this book/I have reread this book | 112 (18.22%) | 26 (4.45%) | 4 (0.61%) | 2 (0.4%) | |
| IM3 I cannot wait to see how this story unfolds in the next book | 167 (25.71%) | 176 (30.16%) | 3 (0.61%) | 4(0.81%) | |
| IM4 I am addicted to this book/I cannot get enough | 89 (14.17%) | 21 (3.85%) | 2 (0.4%) | 1(0.2%) | |
| IM5 This book stayed with me | 192 (24.09%) | 83 (14.37%) | 2 (0.4%) | 2 (0.4%) |
(3) Dataset Description
Repository name
Open Science Framework
Object name
The GeAbsORB (German Absorption in Online Reviews of Books) Corpus and Annotation Guidelines
Format names and versions
Corpus: .csv and .xlsx
Guidelines: .pdf
Creation dates
2022-12-1 – 2025-04-30
Dataset creators
Moniek Kuijpers, University of Basel
Massimo Lusetti, University of Basel
Lina Ruh, University of Basel
Johanna Vogelsanger, University of Basel
Tina Ternes, University of Basel
Language
Data: German
Metadata: English
Licence
CC-By Attribution 4.0 International
Publication date
(4) Reuse Potential
The annotation guidelines can be used and adapted to tag reader responses other than online book reviews, such as open responses in surveys, interviews or Shared Reading discourse. The full text corpus could be expanded upon with additional annotation schemes (cf. Loi, Lusetti & Kuijpers, 2025). Additionally, the corpus could be used as it is to probe questions of differences in reader responses across genres, or authors, to complement research being done in the field of computational literary studies on distant reading (Moretti, 2013). However, we would like to emphasise the possibility of combining this corpus with the English language corpus and investigating cultural or rather linguistic differences between readers and how they express themselves about reading in different languages. Furthermore, we hope this work may inspire other researchers to translate, test, and adapt the annotation scheme into even more languages and add annotated corpora of book reviews in other languages to ours for a more extensive cross-cultural comparison of how readers express themselves about those reading experiences of intense immersion for which only metaphorical language seems to be appropriate.
Acknowledgements
We would like to thank Antonia Vogler and Pema Frick for their reading of and input on the German language annotation guidelines.
Competing Interests
The authors have no competing interests to declare.
Author Contributions
Moniek Kuijpers: funding acquisition, supervision, conceptualisation, data curation, writing/editing/review, project administration, methodology, resources, validation
Massimo Lusetti: data collection, data visualisation, methodology, investigation, formal analysis, data curation, software, validation
Lina Ruh: methodology, writing/editing/review, formal analysis, data curation, validation
Tina Ternes: data visualisation, methodology, data curation, software, validation
Johanna Vogelsanger: methodology, formal analysis, validation
