Have a personal or library account? Click to login
Decoding hotel reviewers: Insights from a decision tree analysis Cover

Decoding hotel reviewers: Insights from a decision tree analysis

Open Access
|Jun 2025

Full Article

1
Introduction

This study aims to examine the factors that influence travelers’ propensity of sharing their hotel experiences online, studying the use of different online platforms and sociodemographic traits of the customer as independent variables. Online platforms for gathering information about hotels are Online Travel Agencies (OTAs), such as OTA1 (Booking™), OTA2 (Expedia™), OTA3 (Homeaway™), OTA4 (Trivago™), OTA5 (Kayak™), OTA6 (AirBnB™), OTA7 (Hostelworld™), and OTA8 (Momondo™); Review Sites (RSs), such as RS1 (TripAdvisor™) and RS2 (Google reviews™); and Social Media platforms (SMPs), such as SMP1 (Facebook™), SMP2 (Instagram™), SMP3 (Twitter™), and SMP4 (YouTube™). The list of the most commonly used OTAs and travel review sites was based on traffic ranking data from Statista (2019).

In the hotel industry, consumers typically engage with offerings initially through websites and the online environment. The lack of information about actual product features and the inability to physically assess them make electronic word of mouth (eWOM) one of the primary channels through which consumers make their booking decisions (Fan et al., 2018), because of its accessibility to a vast number of reviews (Buhalis & Law, 2008). Hennig-Thurau et al. (2004) defined eWOM as any favorable or unfavorable comment made by potential, existing, or past customers regarding a product or organization, accessible to a wide audience of people and institutions through the Internet. According to Fine et al. (2017), consumers’ engagement in sharing their hospitality experience via eWOM varied based on their chosen review platform, revealing significant differences among consumers who identified their preferred platforms for reviews as SMP1, RS1, or RS2, and those who remained neutral without showing a preference for any particular option. Ghazi (2017) studied the reasons why travelers leave reviews on RS1; the main reasons for positive reviews are helping hotels and social benefits, while negative ones are expressing negative emotions, cautioning other consumers, and social benefits.

Given the significant impact of positive eWOM on consumer choices (and consequently on the economic outcomes in the hotel industry) and the relatively small percentage of users inclined to share their reviews (Serra-Cantallops et al., 2020), it seems very relevant to identify a customer profile prone to provide eWOM (Gössling et al., 2018).

Using data drawn from a survey of travelers about their behavior in relation to writing eWOM reviews, this study uses one of the white-box machine learning (ML) techniques, the decision tree (DT) method, to look for combinations among the variable values that describe the probability of writing reviews; in particular, they find the combinations that give the highest and lowest values of the intention to write reviews.

This study identifies the most important triggers for writing eWOM reviews and the combination of values that increases the probability of writing reviews. The use of SMP1 while looking for information about hotels appears as the most relevant trigger, and the intensive use of SMP1 combined with the “employed” work status provides the highest probability of writing hotel reviews.

The rest of this article is structured as follows. First, we introduce the state of the art of the provision of eWOM and outline the research questions; then, a section is devoted to the methodology used and data collection via a consumer survey. The next section presents the results of the classification technique and a comparison with an alternative logistic regression, considered as a reasonable benchmark. We present the discussion and conclusions of this work. The last two sections yield theoretical and practical implications and highlight the limitations and possible directions for future research.

2
Literature review

Extant research studies the impact of eWOM on hotels’ customer-booking decisions (Fan et al., 2018; Tsao et al., 2015). Online reviews from travelers serve as an essential resource for information in the hospitality industry and have been studied in terms of their different aspects, such as source credibility, perceived risk, and information usefulness (Gonzalez-Rodriguez et al., 2022). Online platforms have become important sources of information open to the general public, content is shared as online reviews, capturing travelers’ experiences during their tours (Elliott, 2020). These reviews provide meaningful insights into travelers’ levels of satisfaction or dissatisfaction, their engagement as customers, their sentiments, and the challenges they encounter with hospitality services (Roy et al., 2021; Zhao & Wang, 2019). In contemporary travel, online reviews have assumed a pivotal role in the evaluation of customers’ accommodation experiences as well as their sentiment toward hospitality services. This evaluation process can result in behavioral outcomes, including the likelihood of revisiting a hotel, overall satisfaction levels, and future booking intentions (Park et al., 2020; Zhao & Wang, 2019). The current trend is to use sentiment analysis and natural language processing (NLP) to explore the relationship between different aspects of hotel amenities and services and various customer sentiments.

Some authors have studied the influence of positive reviews on consumer trust and attitudes (Gavilan et al., 2018; Song et al., 2022). Le and Ryu (2023) studied the impact of vloggers (video bloggers) negative reviews on booking intention in the hotel industry. Gregoriades et al. (2021) studied ways to measure tourist satisfaction (e.g., sentiment analysis and review scores) and how to optimize the message in the context of a Marketing Campaign. Significant academic attention has been directed toward devising strategies to alleviate the effect of negative online reviews within the hospitality sector. Such insights are instrumental for hospitality professionals in refining their strategic planning and marketing efforts (Chakraborty, 2019; Yousaf & Kim, 2023).

Shahid and Paul (2022) studied guests’ experiences in luxury hotels, which were significantly shaped by hedonism, ambiance, escapism, personalization, and convenience. They also found that these enhanced experiences, in turn, were instrumental in encouraging positive eWOM behavior and strengthening consumers’ intentions to revisit. Hotels can benefit greatly from positive eWOM, since positive eWOM can contribute to higher booking rates and enhanced customer satisfaction; therefore, it is relevant for hotels to monitor and leverage eWOM to improve their services and integrate it into their marketing strategies (Gössling et al., 2018). Ghazi (2016) discussed how hotel marketers encourage positive online reviews and manage positive and negative eWOM. His main insights were that marketing managers should focus on sponsoring opinion leaders, reminding/rewarding reviewers, using eWOM campaigns, and using online communities as priority number one.

The personality of the reviewer has also been studied. Filieri et al. (2019) observed that highly unfavorable reviews can be perceived as more useful when they are detailed, easy to comprehend, and when the reviewer is either a specialist or discloses their identity. These findings provide actionable insights for hotel managers to better identify potentially valuable reviews. Research on eWOM provision is also wide, going from studying customer eWOM behavior (Alwash et al., 2019; Ismagilova et al., 2021) to studying the factors producing engagement at the time of writing reviews (Naumann et al., 2020). Some social networks such as SMP1 have been studied in the context of eWOM. The study conducted by Chen et al. (2013) found that SMP1 users join Fan Pages to receive information about products and that their unique needs and personality traits significantly influence their willingness to engage in eWOM activities on these pages. Specifically, users’ unique needs influence their use of the “Like” and “Share” buttons to express their motivations and readiness to share different kinds of content. Bastrygina et al. (2024) reviewed the literature on engagement conceptualization in marketing with a special focus on social media influencers (SMIs). Their study delves into the driving motivations for following SMIs and the subsequent outcomes from such behaviors, focusing on SMP2 influencers in hospitality and tourism and how brands can improve the efficiency of optimizing consumer engagement.

Some recent studies relate demographic factors, such as the age of tourists, to their propensity to provide feedback through SMP1 and the content of their comments. Li et al. (2023) state that the millennial generation is prone to using social networks to share information, and Zhong et al. (2023) delve into the aspects that most influence the perceptions and recommendations of senior audiences when traveling with their family members. Specifically, they found different impacts of service delivery, built environment, social environment, and hospitality amenities. Fine et al. (2017) tested the hypothesis that age has a negative relationship with eWOM review behavior engagement, which is supported by their data.

Regarding the influence of personality traits in engaging with eWOM behavior, Hu and Kim (2018) suggest that individuals high in openness are more likely to engage in eWOM as they are eager to share novel experiences and discoveries. They may use online platforms to express their creativity and appreciation for their unique products or services. Yen and Tang (2015) found that extroverted individuals are sociable, outgoing, and seek social interactions, showing a positive correlation between extroversion and eWOM participation, while other motivations might be related to consumers’ demographics, experiences, or pre-existing attitudes. It also reveals that choosing between RS1s and SMP1 is correlated with different sets of motivations. These findings suggest that motivations are not universally equal and that eWOM behaviors may be correlated with different motivations. Ismagilova et al. (2021) synthesized findings from existing studies on eWOM by employing meta-analysis, which helps reconcile conflicting findings on factors affecting consumers’ intention to engage in eWOM communications. They divided the factors influencing eWOM provision behavior into four categories: personal, social, perceptual, and consumption-based.

This study focuses on the triggers of eWOM writing, not on its consequences. Some studies considered internal (Guo et al., 2017) or external information regarding eWOM content to explore the factors that trigger eWOM behavior (e.g., Yen & Tang, 2019); internal factors refer to the actual eWOM content, while external factors refer to other features, such as demographic information of eWOM’s producer (Gregoriades et al., 2021). In the latter aspect, the authors find a research gap in assessing the connection between customers’ propensity to use specific online platforms to gather information about hotels and the propensity to write a hotel review. Other external factors related to customer profile, such as age, gender, work status, household situation, education level, and income level, were also tested in the model as independent variables. A conceptual model framework of the variables used in this study is shown in Figure 1.

Figure 1

Conceptual model framework.

In order to sum up, while extensive literature has examined the impact of eWOM on consumer decision-making in the hospitality industry – covering topics such as review content, sentiment, reviewer credibility, and consumer response – less attention has been devoted to the antecedents of eWOM creation, particularly the role of specific online platforms in shaping users’ willingness to contribute reviews. Existing studies tend to focus either on the consequences of eWOM or on generalized motivational factors, often neglecting how platform-specific behaviors interact with individual sociodemographic characteristics to influence review-writing. Few works address the asymmetrical and non-linear nature of these relationships. This study addresses this gap by examining the influence of platform engagement and user profiles on the propensity to write hotel reviews.

The set of online platforms (OTAs, review sites, social networks, etc.) has been selected using their traffic rank, and the use of propensity DT is proposed in this study, as they allow for the expression of non-linear relationships among variables, including the asymmetric relationship between the use of online platforms and sociodemographic factors and the propensity to write eWOM. After considering domain knowledge, the research questions (RQ) are as follows:

RQ1: What are the most important triggers for writing eWOM for hotels?

RQ2: What is the best combination of triggers for hotel eWOM writing?

3
Methodology
3.1
Questionnaire design

Before creating the questionnaire, two focus groups were carried out, dividing consumers into two groups according to their age (18–34 and 35–60). The most relevant elements were identified during this qualitative part of the design. We also followed the work of Hyun and Park (2016) and Wang et al. (2012), adjusted to tourism studies, as in Ek-Styvén and Foster (2018), and the work of Wang (2018) on factors that influence travelers’ use of eWOM and their ability to generate eWOM content in return.

The questionnaire and scales were validated through a pilot test, ran on 50 consumers varying in age and background, which led to the correction of technical expressions that were not clear for the general population.

Four sections were used to structure the questionnaire, with the first section consisting of an introduction and a question (Q1), to ensure that only consumers who search for tourist accommodations online can be selected. The second section focused on identifying the specific platforms where customers are likely to read reviews (Q2). The third section, the core of the study, examined how customers feel inclined to write reviews after their stay (Q3). Upon completion of the questionnaire, there were six sociodemographic questions that enabled the analysis: gender, age, work status, household situation, educational level, and net monthly income.

3.2
Description of variables

The variables that were relevant to the research questions are shown in Table 1, indicating the questions and types of answers.

Table 1

Description of variables of questionnaire.

Q1: Do you use or have you ever used the Internet (on computers, tablets, cell phones) to search for hotel information?
Yes/No (end of survey)
Q2: Please check the sites or platforms where you usually look for information about hotels, indicating the ones you use the most.
Q2 is subdivided into 14 variables, each one of them specifying a platform/site. Respondents need to evaluate all of them individually in a 4-point Likert scale: 1 never, 2 little, 3 quite a lot, 4 the most
Q2_OTA1; Q2_RS1; Q2_RS2; Q2_OTA3; Q2_OTA2; Q2_OTA4; Q2_OTA5; Q2_OTA6; Q2_SMP1; Q2_SMP2; Q2_SMP4; Q2_SMP3
Q2_OTA7; Q2_OTA8
Q3: How likely are you to write an opinion after staying in a hotel (approximately how many times do you do it)?
5-point Likert scale: never, rarely, about half of the time, often, and always
Gender: male/female
Age: 18–21, 22–30, 31–45, 46–65, 66–80, over 80
Household status: I live:
Alone; with my partner; with friends; with my partner and children; with my family (parents, siblings, etc.)
Work status:
Not currently employed (studying, unemployed, retired); self-employed; work as a salaried employee in a small company; work as a salaried employee in a large company; manager in a small company; manager in a large company
Educational level:
School graduate; intermediate vocational training; higher vocational training; university degree/graduated; higher studies (Master’s degree, doctorate, etc.)
Net monthly income (in euros)
Less than 1,000; 1,001–2,000; 2,001–3,000; 3,001–4,000; 4,001–5,000; 5,001–6,000; over 6,000
Source: Authors’ own research.
3.3
Sample and data

During December 2022 and January 2023, a convenience sampling method (Malhotra & Birks, 2007) was used to conduct a self-administered survey online that is aimed at users of online tourism opinion platforms. This sampling method is suitable when the study population’s limits are unknown or very wide (Goodman, 1961).

Regarding ethical approval, this research is low risk in nature and we have followed best practice as human participants were involved: (1) all participants received detailed written information about the study and its procedures; (2) no health-related data, either directly or indirectly, were collected, and therefore, the Declaration of Helsinki was not specifically referenced when informing participants; (3) data handling was restricted with data under custody by one of the researchers; (4) the anonymity of all collected data was strictly maintained throughout; (5) ethical approval from a board or committee was not obtained, as it was not required under the relevant institutional and national guidelines; (6) informed consent for the data to be used in research was explicitly requested as a preliminary question in the questionnaire before completing the survey.

In Madrid, Spain’s capital, a first sample of 200 people ranging from 20 to 75 years old received the questionnaire and they were instructed to pass it on to their friends. The number of valid surveys was reduced to 739 after obtaining a total number of 788 questionnaires with 49 missing responses. With this sample size, the margin of error for the 95% confidence interval was 3.6%.

Among the respondents, the number of males was 40.9%, while the percentage of females was 59.1%. Age was centered around 22–30 (34%) and 31–45 (25%). Both age brackets are over-represented but we consider that they are the more intensive users of social media. A total of 47% of the sample had a bachelor’s degree. The most frequent household situation is to live “with my family (parents, siblings, etc.),” with 40% of the respondents. Regarding the income level, respondents concentrate in the lower ranges (i.e., less than 1,000 and from 1,001 to 2,000), with 79% of respondents.

3.4
Statistical tools

The statistical techniques used in the questionnaire analysis are categorized as classification problems or supervised learning, an ML method. These techniques have been widely utilized since their appearance. More specifically, DTs were used. They were introduced following the original findings of Breiman et al. (1984).

When faced with a range of possible decisions, a DT assists in making more accurate decisions from a probability perspective (Hastie et al., 2009), to analyze the results and visually determine how the model flows. In addition, another advantage of DT is that they are considered “white box” techniques (as opposed to “black box”) because the algorithm’s logic is not obscured by them and hence provide an interpretable explanation, which increases trustworthiness (Gregoriades et al., 2021). Recent applications of DT in the fields of economics and business include those of Liu and Yang (2022) and Rosado-Cubero et al. (2022). Using this technique, we were able to examine the effect of every variable on writing review intentions. DTs are constructed through the use of a recursive partitioning method. Using a feature value, data are separated into distinct groups at each node, resulting in subsets that are subtracted again into smaller subsets.

Let U = { X 1 , , X p } U=\{{X}_{1},\hspace{.5em}\ldots ,\hspace{.5em}{X}_{p}\} be a set of independent variables measured on a set Ω of objects. A DT is a directed acyclic-rooted tree. Both a single variable Xk of U and a subset of objects in Ω are recursively associated with each node k of DT as follows: all objects in Ω can be found in the root node. Let k be a node and Sk be the subset of Ω associated with k. For every different value vk of attribute Xk, there is a child Ck of k, and the set of objects associated with Ck are the objects of Sk for which the value of attribute Xk is vk. Merging the categories of each predictor was performed if they did not significantly differ from the dependent variable. A node is a leaf if either the set of objects associated with it contains objects of the same class, according to a specific dependent variable, or if a stopping criterion (i.e., number of levels of the tree) is met. We chose the division model chi-squared automatic interaction detection (CHAID), Kass (1980). CHAID selects the variable that has the most significant relationship with the dependent variable at each step and then uses the chi-square independence test to establish the splitting rule for every node. Given a dependent variable Y with J categories, for each independent variable Xk with Nk categories, calculation of the Pearson chi-square statistic is made: X 2 = j = 1 J i = 1 N k ( n i j m i j ^ ) 2 m i j ^ , {X}^{2}=\mathop{\sum }\limits_{j=1}^{J}\mathop{\sum }\limits_{i=1}^{{N}_{k}}\frac{{({n}_{ij}-\widehat{{m}_{ij}})}^{2}}{\widehat{{m}_{ij}}}, where n i j {n}_{ij} is the observed cell frequency and m i j {m}_{ij} is the expected cell frequency for cell ( x k = i , y = j ) ({x}_{k}=i,\hspace{.5em}y=j) from the independence model. The p value was calculated as p = p ( χ 2 > X 2 ) p=p({\chi }^{2}\gt {X}^{2}) , where χ 2 {\chi }^{2} follows a chi-square distribution with d = (J − 1)(N k − 1) degrees of freedom.

The graphical representation provided by this method is a significant advantage as it enables us to identify the combination of predictive variables that produce the highest (and also the lowest) values of writing review intention. However, the main limitation is the possibility of overfitting the data.

We evaluated the quality of the predictions using the classification table, which presents the observed versus predicted (by the model) values, with the calculations of the percentages of correct classification overall (accuracy), and for each group (sensitivity and specificity).

IBM SPSS Statistics for Windows (v 29.0) was used. The significance level of splitting nodes and merging categories was controlled by setting the default significance level to 0.05. The likelihood ratio method was chosen for the calculation of the chi-square statistic. This method is more robust than Pearson’s, although it takes a longer time to calculate. This is the preferred method for small samples, as in our case. The maximum number of iterations was fixed to the default value (100). The same decision was made with the minimum change in the expected cell frequencies (default 0.05). Due to the problem’s nature, the authors introduced misclassification costs, which allowed the inclusion of information regarding the relative penalty associated with incorrect classification. In our case, the cost ratio was fixed 2 to 1 (e.g., the key objective is to detect those customers who are prone to write reviews). It is advisable to avoid overfitting of the model by pruning the tree: the tree is grown until stopping criteria are met, and then, it is trimmed automatically to the smallest subtree based on the specified maximum difference in risk. The risk value is expressed in standard errors. We have used the default value (1.0). The minimum leaf size has been set to 40.

The validation method used was split-sample validation: 70% training and 30% testing. The model was generated using a training sample and tested using a hold-out sample. The results are displayed for both training and testing samples.

Following the aforementioned discussion, a DT was built to predict the dependent variable Q3: Propensity to write a review: how often do you write a review after your stay in a hotel? The original 5-point Likert scale was recoded into a binary variable (never/rarely, half/often/always) to improve the interpretability of the results, avoiding branch atomization. This helps prevent the tree from generating too many branches that separate similar categories – such as “never” and “rare” – which can introduce noise and make it harder to identify the most relevant variables.

The 14 variables with a 4-point Likert scale corresponding to Q2 (e.g., OTA1, SMP1) were introduced as independent variables, as well as the 6 sociodemographic questions: gender, age, work status, educational level, and net monthly income. Again, a recoding process was performed for some of the variables. Work status was reduced to three categories: not working (student, unemployed, or retired), employed; self-employed, and manager. At the same time, the net monthly income variable was also re-coded into three categories: 1,001–2,000, 2,001–3,000, and over 3,000. Figure 1 illustrates the model variables.

4
Results

The final DT is displayed in Figure 2, with depths of two, seven, and four terminal nodes. Only Q2_SMP1, Q2_OTA1, and work status are significant and therefore produce splits into nodes. Only three variables were selected because DTs prioritize variables based on their ability to split the data in ways that reduce classification error. Variables that do not lead to substantial improvements in classification are either placed in deeper, less influential branches or excluded altogether. The final value for the risk estimate is 0.438, with a standard error of 0.034.

Figure 2

DT (test sample) of triggers for eWOM hotel reviews.

It is easy to elaborate the rules for different terminal nodes. If we focus on more extreme situations (nodes 3 and 6), the rules are as follows:

Node 3: IF ((Q2_SMP1 ≤ 2) AND (Q2_OTA1 ≤ 3)) THEN Prediction is Never/rarely.

Node 6: IF ((Q2_SMP1 > 2) AND (Work status > 1)) THEN Prediction is Half/often/always.

That is, if no intensive use is reported on either SMP1 or OTA1, then it is highly likely that the traveler is not going to write a hotel review (probability 0.712 in node 3). On the other hand, the intensive use of SMP1 by employed travelers is the profile giving the highest probability (0.619 in node 6) of writing a review.

Table 2 lists the associated classification tables for both the training and test samples. Above all, it is necessary to look at the test results because the model is overfitted and the estimates are likely to be biased (optimistic). The overall percentage of correct classifications (accuracy) in the chosen model was 59.7%, with a sensitivity of 65.9% and a specificity of 55.6%. There were 65.9% of observations correctly classified into the group of travelers showing a propensity to write an online review, which is the group of interest. These results are good given the heterogeneity of individuals.

Table 2

Classification table of DT model.

Classification
SampleObservedPredicted
TrainingNever/rarelyHalf/often/alwaysPercent Correct (%)
Never/rarely15015149.8
Half/often/always7514265.4
Overall percentage56.4
TestNever/rarely745955.6
Half/often/always305865.9
Overall percentage59.7

Growing Method: CHAID.

Source: Data processed from IBM SPSS output.

To validate these results, a comparison was performed against a logistic regression model with forward selection, both a reasonable baseline and a technique that is easy to implement. Table 3 shows the associated classification table, which allows a direct comparison with the proposed DT.

Table 3

Classification table of logistic regression model.

Classification
ObservedPredicted
Never/rarelyHalf/often/alwaysPercent correct (%)
Never/rarely3805487.6
Half/often/always22085 27.9
Overall percentage62.9
Source: Data processed from IBM SPSS output.

The overall percentage of correct classifications given by the logistic regression (62.9%) improved the corresponding DT percentage (57.4%), but the sensitivity was clearly worse: 27.9 vs 65.6%. Because the key objective of the technique is to correctly detect eWOM providers), DT shows a better result.

There are other metrics that can be calculated from the classification table. In particular, F1 score stands out in handling imbalanced datasets, as it is the case. It merges precision (the proportion of true positives among all instances that the model has identified as positive) and sensibility (also known as recall). More specifically, it is the harmonic mean of these two metrics. A higher F1 score indicates a better balance between precision and recall. The F1 score of the DT (56.6%) improves the result of the logistic regression (38.3%).

5
Discussion and conclusions

This work proposes a methodology for extracting patterns relating to travelers’ propensity to write eWOM hotel reviews by analyzing potential triggers, such as their behavior on online platforms, as well as their sociodemographic characteristics. The authors developed a DT to identify and predict which hotel customers have the highest probability of writing eWOM.

The proposed model achieved reasonably good results (65.9%) when predicting customers’ propensity to write online reviews. This allows the definition of a customer profile that is of most interest to hotel managers in the context of eWOM management, as they can actively activate some special treatment for those who are most likely to provide eWOM, and therefore, be of greater value for the hotel. The main findings are that customers who use SMP1 intensively to gather information about hotels and at the same time are employed, have the highest probability to write eWOM reviews. Employed status is recorded as opposed to those who are not working, are self-employed, or are managers. The proposed model also identifies travelers with a lower probability of writing hotel reviews as those with low-intensity use of SMP1 and OTA1 when looking for information about hotels. The interpretation of these results requires caution, since the sampling method is not probabilistic.

Our results differ from those of Fine et al. (2017), who support the influence of age on the probability of writing reviews, as age is not a relevant factor when classifying customers as more likely to write reviews.

Customer qualifications are needed to identify their potential for providing eWOM, which can be easily achieved with a short conversation in the check-in asking what their most used platforms to look up for hotel information.

The ML white-box technique used allows for the interpretation of results and argumentation, as there is a logic of the algorithm used. To boost their confidence in the results, the authors tested them against a benchmark model: logistic regression. The proposed DT model improves the baseline performance of a logistic regression model in predicting customers belonging to the group of eWOM providers. A comparison with a reasonable benchmark represented by the logistic regression model needs interpretation because, although the logistic regression provides a higher overall percentage of correct classifications (accuracy), it fails to detect the percentage of correct classifications of the group of interest (eWOM providers), with a probability lower than 30%. A higher overall percentage is achieved because the logistic regression model allocates most users to the never/rarely category, which is the group of non-eWOM writers, with the majority in the sample and population.

The originality of the study resides in two features of this work: On the one hand, the use of an ML method, DTs, to solve the raised research questions. Although similar methods have been used in the past (Gregoriades et al., 2021; Vermeer et al., 2019), as far as we know, it is the first time it is used to identify potential review providers. On the other hand, the objective is to identify a combination of variables that determine the propensity of hotel customers to write reviews.

6
Theoretical and practical implications

This work has answered the two proposed research questions, identifying the main triggers for writing eWOM reviews and the best combination of values that increases the customer’s propensity to write. This study demonstrates the relevance of both behavioral triggers (such as intensive use of SMP1 for hotel information) and sociodemographic characteristics (specifically employment status) in predicting customers’ propensity to write eWOM hotel reviews. The use of a DT model is effective for identifying and predicting customers with the highest probability of writing eWOM reviews. This suggests the utility of advanced analytical techniques such as ML algorithms in uncovering complex patterns and relationships within consumer data for marketing purposes. This study emphasizes the interpretability and validation of results by comparing the DT model with a logistic regression benchmark. Although logistic regression outperforms overall accuracy, the DT model offers insights specifically relevant to the group of eWOM providers. This underscores the importance of selecting appropriate statistical tools and benchmarking techniques to assess the performance of predictive models in marketing research.

Practical implications are that hotel managers can predict the probability that their customers will become eWOM providers, which is a key contribution to the hotel’s online image and marketing strategy. There is a significant value attached to the knowledge that some specific customers are prone to write eWOM reviews and that the way to find out is within the scope of a brief conversation at the check-in stage. The treatment for those customers identified as having a higher propensity to write should be aimed at excellence, being aware that service satisfaction is not sufficient to drive them to write reviews, but is a necessary step. Although the authors do not distinguish between positive and negative reviews, it is clear that online hotels’ reputation is at risk due to negative reviews, which could have serious economic consequences (Prayag et al., 2018), while positive reviews stimulate demand. Perez-Aranda et al. (2018) studied hotel managers’ responses to eWOM to manage their respective reputations, developing a model with indications regarding applicable procedures. Some authors (Gössling et al., 2018) studied the growing competition faced by hotel managers due to ratings and rankings, while at the same time, guests are becoming more aware of the significance of reviews. Their results, based on a sample of hotel managers, suggest that managers strategically interfere with avoiding the impact of negative online reviews. There are different manipulation strategies used by hotel managers, such as encouraging staff, social media, and customers to write positive reviews; collaborating with platforms; writing reviews themselves; enlisting commercial raters to improve reputation; or posting negative reviews of rivals on open sites like RS1 (Gössling et al., 2018). In our study, we touch on the engagement of customers in the writing of reviews, although in order to solve the managers’ ethical dilemma of interfering or not on the sign of the reviews, we do not suggest requesting reviews from customers who are prone to write positive reviews, but any type of reviews. The “manager’s dilemma” has been described as the ethical choice that hotel managers face; considering that some competitors employ various manipulation strategies, this can create a scenario where “honest” managers feel compelled to adopt similar tactics to enhance ratings and rankings due to increasing market pressure (Gössling et al., 2019).

Our research provides rich insights to managers about how guests’ profiles in their use of different online platforms and sociodemographic factors are relevant, attaching importance to various factors of their information-seeking behavior (use of SMP1) and working status. Knowledge of the use of platforms provides an early warning signal to hoteliers to improve these aspects.

7
Limitations and future research

A limitation of this study is the use of non-probability sampling, as survey responses were collected through a snowball approach, where initial participants were encouraged to distribute the questionnaire within their own networks. The use of this approach makes results more difficult to extrapolate to the general population; therefore, results should be taken with caution. The authors also realized that some additional variables may have been useful to measure and include in the survey to improve the accuracy of the study. The technique has provided promising results, and further research can expand the number of variables by identifying the channel where the booking was done and the customers’ frequency of traveling, as authors suspect that heavy users can have an impact on being more active in their eWOM provision.

Funding information

Faculty of Commerce and Tourism, Complutense University of Madrid.

Author contributions

Miguel Llorens-Marin: Conceptualization; Investigation; Methodology; Project administration; Resources; Supervision; Validation; Writing – original draft; and Writing – review & editing. Adolfo Hernandez: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Software; Validation; Visualization; Writing – original draft; and Writing – review & editing. Maria Puelles-Gallo: Resources; Software; Supervision; Visualization; Writing – original draft; and Writing – review & editing.

Conflict of interest statement

Authors state no conflict of interest.

Data availability statement

Research data is available upon request to the corresponding author.

DOI: https://doi.org/10.2478/mmcks-2025-0010 | Journal eISSN: 2069-8887 | Journal ISSN: 1842-0206
Language: English
Page range: 81 - 92
Submitted on: Feb 19, 2025
Accepted on: Jun 24, 2025
Published on: Jun 26, 2025
Published by: Society for Business Excellence
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2025 Miguel Llorens-Marin, Adolfo Hernandez, Maria Puelles-Gallo, published by Society for Business Excellence
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.