Absorption in German Online Book Reviews. Presenting the German-Language AbsORB Metadata Corpus and Annotation Guidelines

Moniek M. Kuijpers; Massimo Lusetti; Lina Ruh; Tina Ternes; Johanna Vogelsanger

doi:10.5334/johd.375

Full Article

(1) Overview

Repository location

Open Science Framework. “Absorption in Online Book Reviews”, https://osf.io/kr4v6/

(German metadata corpus: https://osf.io/kwru5/; German annotation guidelines: https://osf.io/v6rca/)

Context

The main aim of this project was to translate, test, and adapt the English-language AbsORB annotation guidelines (Kuijpers et al., 2023a, 2023b) for working with German-language online book reviews. Originally, the English-language annotation guidelines were developed for the purposes of validating the Story World Absorption Scale (SWAS, Kuijpers, Hakemulder, Tan & Doicaru, 2014), which is a self-report instrument to capture experiences of absorption during reading, by comparing the statements on the instrument to the unprompted expressions people use in online book reviews to describe their absorbing reading experiences (cf. Kuijpers, Lusetti, Lendvai & Rebora, 2024). Absorption is a reading experience, during which the reader is focused on the narrative, emotionally engaged with its characters and mentally transported to the story world (cf. Kuijpers et al., 2014). It is an experience that has been empirically shown to be a predictor of both reading enjoyment and aesthetic appreciation (e.g., Green, Brock & Kaufman, 2004; Kuijpers, 2014).

By translating, testing and adapting the annotation guidelines to German and annotating a corpus of German-language book reviews scraped from the Goodreads website, we aimed to investigate the prevalence of the absorption metaphor in different languages and compare how it is used across these languages, and to contribute to the growing field of research on Digital Social Reading (cf. Pianzola, 2021).

(2) Method

Building the corpus

In the first phase of the data collection, at the end of 2022, we scraped approximately 96,000 German-language reviews of seven different genres (fantasy, romance, thriller, science fiction, historical fiction, horror, and mystery) from the Goodreads website. In a second phase (beginning of 2023), we scraped 3,000 reviews from the genre “Climate Fiction” for a side-project (cf. Loi, Lusetti & Kuijpers, 2025) and 8,000 reviews of books originally written in German by Swiss, German and Austrian authors (to see if we could find salient differences between reviews written about original German works and translated works).

From the total amount of scraped reviews, we wanted to select the most interesting in terms of absorption, i.e., those with a high number of potential absorption indicators, to be annotated. A simple automated selection method was developed for this purpose. Unlike in the English language AbsORB project (Kuijpers et al., 2023a), in which a machine learning method was used (cf. Lendvai et al., 2020; Rebora, Kuijpers & Lendvai, 2020), this method was based on counting the number of times certain keywords, that were identified as being good absorption indicators during an initial annotation practice phase, appeared per review. For each book in our corpus, the five reviews with the highest keyword score were selected for the annotation phase. Despite its simplicity, this method proved to be highly effective. More specifically, the selection method consisted of the following steps: 1) Reviews that contained links were removed; 2) Reviews that were too short or too long were removed (the delimiting thresholds changed over the different phases of the project); 3) Reviews that contained images were removed, as the image would not be visible in our annotation tool, but might be referred to in the review text, creating ambiguity; 4) Each remaining review received a score based on the number of keyword matches; 5) For each book, only the five highest-scoring reviews were retained for the annotation phase (since the goal was to annotate five reviews for each selected book, books with fewer than five reviews were discarded); 6) In order to match the number of reviews in our English-language corpus, only 494 of our 695 annotated German reviews were curated and included in the metadata corpus. Of course, these criteria bias our sample, but since our project was concerned with investigating how people express themselves about absorption (rather than how often they talk about absorption), we needed to preselect absorption-rich reviews. In the final corpus, there are reviews on 99 different books, from 85 different authors. Our original English corpus included reviews on 98 different books, from 88 different authors; only 4 books appeared in both corpora. As the number of German language reviews published on Goodreads is substantially smaller than the number of English reviews, we weren’t able to make the corpora comparable. However, because we were interested in the expression of absorption and how that potentially differed between the two languages, this did not hinder our project’s aims.

Similarly to the English-language AbsORB project, we decided to prepare two versions of our corpus: one with metadata, but without the full review texts (this version is available under the OSF-link presented in this paper), and one which does include the anonymised full-text reviews, which can be requested from the authors. We made this decision based on the sensitive nature of the data with respect to both copyright and privacy law (APA, 2023).

Developing the annotation guidelines

The annotation guidelines were developed throughout the annotation process, which started in December 2022 and was divided into three practice rounds and seven actual rounds, the last of which was completed in November 2023. For a thorough description of the annotation process, we refer readers to our paper on the annotation process in the English-language AbsORB project (Kuijpers et al., 2024).

For the project presented here, the annotation categories that were developed over the course of the English-language AbsORB project were taken as point of departure for the German annotation guidelines. After the initial translation of all AbsORB statements and categories, we continuously adjusted the German statements throughout the annotation process based on the language used in German-language reviews. We added example statements from reviews to the guidelines wherever possible, as well as comments for each statement to help other researchers decide whether a sentence belongs to a certain category or not. These comments are in some cases similar to those in the English-language guidelines, suggesting that the experience of absorption is rather similar or at least similarly discussed in both languages. For example, for most of the negated Transportation categories, we could not find any example from the reviews. This may indicate that regardless of the language used, reviewers do not seem to talk about “not feeling like I was in the story world”, they only talk about Transportation when it happened. This is interesting as for other aspects of absorption, such as Attention or Impact, people do write about “not being able to concentrate” or their engagement “not being effortless”, indicating, perhaps, that these aspects are more fundamental to the experience of absorption. However, in other cases the comments in the annotation guidelines are particular to the German language of the corpus. For example, we added explications on the difference between the Empathy and Sympathy categories, since very similar language is used for these in German. The concept of sympathy is generally rendered as “Mitgefühl”, whereas the related verb “mitfühlen” is commonly used to describe empathy, but not sympathy.

Table 1 shows all of the annotation categories and how many times they were used to annotate segments of reviews in both the English and the German corpus. We decided to include the frequency values for the English corpus (and percentages) in order to facilitate comparisons in terms of cultural or linguistic differences between English-language reviews and German-language reviews. For example, Emotional Engagement seems important for readers’ feelings of absorption, in both the English and the German data. However, while it seems particularly important for German-language users to establish an emotional connection to the character (115 instances of EE3 versus only 69 instances in the English corpus), for English-language users their relationship to the characters seems to extend beyond the book, manifesting as imagined or desired parasocial relationships (79 instances of EE11 versus only 22 instances in the German corpus). Additionally, (the desire for) rereading books was vastly more common in English reviews than in German reviews (112 instances of IM2 versus 26 instances, respectively), whereas the feeling of Effortless Engagement was more common in German reviews than English reviews (100 instances of IM1 versus 68 instances, respectively).

Table 1

Annotation layers and categories with number of annotations per category (presence or negation of the category) and proportions (in percentages) of the overall number of annotations of each category per review: the English and the German AbsORB corpora. Note. 12.15% of English reviews did not include any annotations and 3.64% of German reviews did not include any annotations.

	STATEMENT	ENGLISH ABSORPTION PRESENT	GERMAN ABSORPTION PRESENT	ENGLISH ABSORPTION NEGATED	GERMAN ABSORPTION NEGATED
Attention	A1 While reading time moved differently	3 (0.61%)	6 (1.21%)	0 (0%)	0 (0%)
	A2 My attention was focused on the book	10 (2.02%)	4 (0.61%)	2 (0.2%)	1 (0.2%)
	A3 I was absorbed by the book	186 (29.55%)	282 (43.32%)	8 (1.01%)	33 (6.07%)
	A4 I was not distracted while reading	3 (0.4%)	0 (0%)	2 (0.4%)	10 (0%)
	A5 While reading I forgot the world around me	16 (3.04%)	2 (0.4%)	0 (0%)	1 (0.2%)
	A6 I wanted to know what would happen next	108 (18.42%)	152 (25.3%)	3 (0.61%)	24 (4.25%)
	A7 I did not want to put the book down	145 (21.86%)	136 (24.7%)	5 (0.81%)	23 (3.85%)
Emotional Engagement	EE1 I could imagine what it must be like to be this character	35 (5.47%)	45 (8.5%)	0 (0%)	13 (2.63%)
	EE2 I sympathized with this character	53 (8.91%)	23 (4.66%)	4 (0.81%)	4 (0.81%)
	EE3 I felt a connection to this character	69 (10.32%)	115 (19.84%)	10 (1.82%)	15 (2.83%)
	EE4 I felt how this character was feeling	72 (10.73%)	67 (13.16%)	1 (0.2%)	3 (0.61%)
	EE5 I felt for what happened in the story	79 (11.34%)	84 (14.17%)	0 (0%)	4 (0.61%)
	EE6 I felt angry at the character	19 (3.85%)	11 (2.23%)	1 (0.2%)	0 (0%)
	EE7 I felt scared for the character	5 (0.81%)	6 (1.21%)	0 (0%)	0 (0%)
	EE8 I felt like I knew this character	11 (2.23%)	22 (4.45%)	0 (0%)	1 (0.2%)
	EE9 I wish I could be more like this character	8 (1.01%)	5 (1.01%)	0 (0%)	0 (0%)
	EE10 I understood why this character did this	26 (4.86%)	48 (8.5%)	5 (1.01%)	22 (4.25%)
	EE11 I want to have some kind of relationship with this character	79 (11.94%)	22 (3.64%)	0 (0%)	0 (0%)
	EE12 I wanted to involve myself in the story events	42 (6.88%)	10 (2.02%)	0 (0%)	0 (0%)
Mental Imagery	MS1 I could imagine what the characters looked like	15 (2.63%)	7 (1.42%)	3 (0.2%)	0 (0%)
	MS2 I could see the story events clearly in my mind	20 (4.05%)	8 (1.62%)	0 (0%)	1 (0.2%)
	MS3 I could imagine what the story world looked like	25 (4.45%)	51 (9.92%)	2 (0.2%)	0 (0%)
	MS4 The character/story world felt real to me	73 (12.75%)	61 (11.54%)	0 (0%)	8 (1.62%)
Transportation	T1 While reading this I was in the story world	13 (2.63%)	18 (3.24%)	0 (0%)	0 (0%)
	T2 Elements from the story world came into my world	14 (2.43%)	8 (1.42%)	0 (0%)	0 (0%)
	T3 The story world felt close to me	4 (0.4%)	15 (2.83%)	0 (0%)	0 (0%)
	T4 I felt transported to the story world	19 (3.64%)	54 (9.11%)	0 (0%)	0 (0%)
	T5 I felt part of the story world	34 (5.26%)	23 (4.45%)	0 (0%)	1 (0.2%)
	T6 I returned from a trip to the story world	3 (0.4%)	1 (0.2%)	0 (0%)	0 (0%)
	T7 I traveled with the characters through the story world	26 (5.06%)	30 (5.67%)	0 (0%)	0 (0%)
Impact	IM1 It was an easy read/I devoured this book	68 (11.94%)	100 (18.02%)	40 (6.88%)	79 (13.16%)
	IM2 I will reread this book/I have reread this book	112 (18.22%)	26 (4.45%)	4 (0.61%)	2 (0.4%)
	IM3 I cannot wait to see how this story unfolds in the next book	167 (25.71%)	176 (30.16%)	3 (0.61%)	4(0.81%)
	IM4 I am addicted to this book/I cannot get enough	89 (14.17%)	21 (3.85%)	2 (0.4%)	1(0.2%)
	IM5 This book stayed with me	192 (24.09%)	83 (14.37%)	2 (0.4%)	2 (0.4%)

(3) Dataset Description

Repository name

Open Science Framework

Object name

The GeAbsORB (German Absorption in Online Reviews of Books) Corpus and Annotation Guidelines

Format names and versions

Corpus: .csv and .xlsx

Guidelines: .pdf

Creation dates

2022-12-1 – 2025-04-30

Dataset creators

Moniek Kuijpers, University of Basel

Massimo Lusetti, University of Basel

Lina Ruh, University of Basel

Johanna Vogelsanger, University of Basel

Tina Ternes, University of Basel

Language

Data: German

Metadata: English

Licence

CC-By Attribution 4.0 International

Publication date

(4) Reuse Potential

The annotation guidelines can be used and adapted to tag reader responses other than online book reviews, such as open responses in surveys, interviews or Shared Reading discourse. The full text corpus could be expanded upon with additional annotation schemes (cf. Loi, Lusetti & Kuijpers, 2025). Additionally, the corpus could be used as it is to probe questions of differences in reader responses across genres, or authors, to complement research being done in the field of computational literary studies on distant reading (Moretti, 2013). However, we would like to emphasise the possibility of combining this corpus with the English language corpus and investigating cultural or rather linguistic differences between readers and how they express themselves about reading in different languages. Furthermore, we hope this work may inspire other researchers to translate, test, and adapt the annotation scheme into even more languages and add annotated corpora of book reviews in other languages to ours for a more extensive cross-cultural comparison of how readers express themselves about those reading experiences of intense immersion for which only metaphorical language seems to be appropriate.

Acknowledgements

We would like to thank Antonia Vogler and Pema Frick for their reading of and input on the German language annotation guidelines.

Competing Interests

The authors have no competing interests to declare.

Author Contributions

Moniek Kuijpers: funding acquisition, supervision, conceptualisation, data curation, writing/editing/review, project administration, methodology, resources, validation

Massimo Lusetti: data collection, data visualisation, methodology, investigation, formal analysis, data curation, software, validation

Lina Ruh: methodology, writing/editing/review, formal analysis, data curation, validation

Tina Ternes: data visualisation, methodology, data curation, software, validation

Johanna Vogelsanger: methodology, formal analysis, validation