The Efficient Market Hypothesis (EMH) posits that financial markets are efficient, with asset prices reflecting all available information (Fama, 1970). Yet, the exponential growth of social media has transformed how information is produced, disseminated, and absorbed (Michalak & Kruszewski, 2021). In ultra-high-frequency information environments, investors often face challenges in gathering and processing sufficient information within short time frames to make fully rational decisions consistent with EMH. Platforms such as Twitter generate real-time streams of market-relevant signals but also amplify noise and emotional content (Michalak, 2024b). This raises a critical question: can markets remain efficient when investors face unprecedented volumes of unstructured, emotionally charged information?
Behavioural and psychological research shows that cognitive overload and emotional cause strongly shape decision-making. Leading investors to rely on heuristics rather than rational evaluation (Ekman & Davidson, 2002; Thaler, 2015). Emotions such as fear, anger, or optimism can spread rapidly through digital networks, influencing collective sentiment and triggering market reactions. Social media thus functions not only as a channel of information but as a driver of sentiment shocks that challenge rational market dynamics (Ekman, 1972; Ekman & Davidson, 2002; Thaler, 2015; Michalak & Kruszewski, 2021).
A growing body of research has linked online sentiment to stock market movements (Bollen et al., 2011; Fischer & Krauss, 2018). However, several limitations persist. Most studies employ either traditional econometric models or machine learning approaches in isolation, leaving their comparative strengths unclear (Mao et al, 2012; Aldridge, 2017; Nagle & Gibbons, 2018; Petrescu & Gherghina, 2020; Ranjan et al., 2021; Dissanayake et al., 2021; Katsafados et al., 2022). In addition, few analyses tweets for multiple sentiment dimensions or test how different modelling frameworks capture nonlinear dynamics between online activity and market behaviour. Addressing these gaps is essential to advance both theory and practice in financial forecasting.
This study makes three key contributions. First, it systematically compares the predictive power of Vector Autoregression (VAR), a widely used econometric approach, and Long Short-Term Memory (LSTM) neural networks, a state-of-the-art deep learning method. Second, it integrates multiple sentiment indicators derived from Twitter, capturing not only polarity but also specific emotions. Third, it tests their ability to forecast two core market variables – trading volume and closing price – for major technology firms (Apple and Amazon).
Based on this framework, we formulate two hypotheses: (H1) Twitter activity is significantly associated with market changes, and (H2) LSTM models outperform VAR models in capturing these dynamics. The first hypothesis aims to establish a reliable connection between processes, addressing the literature's lack of clear interdependencies, including not only positive and negative sentiment but also specific emotion indicators (for example, anger and fear). The second hypothesis addresses the study's objective of evaluating the predictive capabilities of VAR and LSTM models (Laney, 2001; Antweiler & Frank, 2004; Manovich, 2011; Siganos et al., 2014; Ishikawa, 2015; Bello-Orgaz et al., 2016; Reinsel et al., 2017; Michalak & Kruszewski, 2021).
By bridging econometrics, machine learning, and behavioural finance, this research advances understanding of how digital sentiment influences financial markets. Beyond theoretical implications, our findings highlight the potential of social media monitoring as a practical tool for investors, risk managers, and policymakers operating in increasingly information-driven markets.
The relationship between social media communication and stock market dynamics can be examined through multiple interdisciplinary approaches, including behavioural finance, sociology, and economic psychology. What brings together these perspectives is the fact that financial decisions are not purely cognitive but are profoundly shaped by affective processes that guide attention, judgment, and collective behaviour. Even apparent deviations from the Efficient Market Hypothesis (EMH) can be understood as the outcome of emotional interference with rational information processing.
To provide a stronger theoretical foundation, this study draws on the theory of constructed emotion (Barrett, 2017). From a theoretical perspective, emotions are not universal, hardwired responses but contextually constructed experiences that arise from the interaction of interoceptive predictions, cultural concepts, and situational context. Investor reactions to Twitter activity, therefore, should not be interpreted as direct indicators of sentiment but as affective interpretations shaped by shared financial narratives, prior experiences, and social norms. These constructed emotions activate specific neural and behavioural pathways, potentially driving observable market behaviours such as rapid asset withdrawal, herding, or sudden shifts in trading patterns (Michalak, 2024a).
This process unfolds as follows: social media signals (e.g., tweets, news, trending topics) are perceived and filtered through the investor's internal state, prior experiences, and socially constructed financial concepts. These signals generate discrete emotions—such as fear, optimism, or anger—which are not automatic reactions but rather constructed interpretations. The brain integrates these affective states with predictions about future market outcomes, which then guide attention allocation, risk assessment, and decision-making (Barrett, 2017; Michalak, 2024).
Consequently, emotions can trigger specific trading behaviours: optimism may amplify buying and overvaluation, while fear or anger can precipitate rapid sell-offs, increased volatility, or herding behaviour (Michalak, 2024b). This mechanism links psychological constructs to observable market dynamics, providing a theoretically grounded explanation for departures from EMH and highlighting the role of contextually constructed emotions in shaping financial markets.
While Fama's EMH assumes that investors possess unrestricted cognitive capacity (Fama, 1970), behavioural finance has repeatedly demonstrated that cognitive limitations and emotional influences profoundly shape decision-making. Affective processes—such as mentioned fear, optimism, or anger—can distort attention allocation, risk perception, and valuation judgments, generating temporary mispricings in financial markets (Thaler, 2015).
Despite this, much of the empirical literature treats investor sentiment in a monolithic way, often reducing it to a single polarity index (positive vs. negative). This oversimplification neglects the richness of discrete emotions, which can exert distinct and sometimes opposing effects on trading behaviour and market dynamics (Zobal, 2017).
Consequently, a critical gap remains in understanding how discrete, contextually constructed emotions—rather than aggregated sentiment—interact with market efficiency, volatility, and trading behaviour. Addressing this gap requires an integrative framework that bridges behavioural finance with psychologically grounded theories of emotion, such as Barrett's Theory of Constructed Emotion (2017).
To empirically capture these dynamics, a variety of methodological approaches have been applied to model the relationship between social media communication and stock market variables (Bollen et al., 2011; Michalak & Kruszewski, 2021; Zeitun et al., 2023; Jena & Majhi, 2023). Early studies relied on linear methods, such as Pearson correlation and linear regression, or traditional time-series models like ARIMA and Granger causality, which primarily capture linear dependencies. However, both social media and market data exhibit nonlinear, high-dimensional interactions, motivating the use of more sophisticated models, including GARCH, SOFFN, and Long Short-Term Memory (LSTM) networks (Bollen et al., 2011).
Vector Autoregression (VAR) has been a popular choice due to its ability to model dynamic lead-lag relationships without requiring strict classification of endogenous and exogenous variables (Nofer & Hinz, 2015; Kumari & Mahakud, 2015). VAR also allows for impulse response analysis, assessing how shocks in Twitter activity propagate to market outcomes (Katsafados et al., 2022). Structural VAR (SVAR) extensions have been applied to account for endogenous interdependencies, such as in the TEPU index for monitoring real-time economic policy uncertainty (Yeşiltaş et al., 2022).
Despite its interpretability, VAR has limitations: it struggles with long-term sentiment dynamics, nonlinear interactions, and noisy social media data, which often include spikes, outliers, and memory effects (Azar & Lo, 2016; Michalak, 2021). In contrast, LSTM networks are particularly well-suited for capturing sequential, nonlinear patterns and long-term dependencies. By processing evolving Twitter discussions, LSTMs can detect complex interactions between constructed emotional signals and market responses (Jena & Majhi, 2023; Sawka, 2023). They also demonstrate resilience to noise and adaptability to structural shifts, offering higher predictive accuracy for market outcomes influenced by affective dynamics (Fischer & Krauss, 2018; Rashid & Tanjim, 2021).
In summary, integrating the behavioural finance perspective with a psychologically grounded theory of constructed emotions suggests that discrete affective responses expressed on social media are key drivers of market dynamics. Methodologically, this calls for models capable of capturing both the nonlinear, long-term, and sequential nature of these interactions, positioning LSTM networks as a particularly powerful tool, while VAR remains useful for shorter-term, interpretable analyses.
Following the theoretical framework of constructed emotion (Barrett, 2017), we treat Twitter posts not merely as reflections of sentiment, but as contextually constructed affective signals shaped by cultural, social, and situational factors—with sentiment and emotions labels.
Let D denote a set of tweets, where each tweet dn represents a discrete communicative unit. Let X be the feature set, composed of sequences of n-grams, and let each tweet dn be represented as a sparse vector of length m, with each dimension corresponding to a specific feature (Michalak, 2024b). While this formalisation provides a computationally tractable representation of textual data, from the perspective of Barrett's theory of constructed emotion, each token, phrase, or feature may carry affective significance. That is, the cognitive-affective meaning of a tweet is not intrinsic to single words but arises through the context-dependent construction of emotion. Thus, every feature in X has the potential to contribute to the affective interpretation that influences investor perception and behaviour.
Sentiment analysis of Twitter data was performed using a structured, four-module pipeline encompassing data collection, preprocessing, feature extraction, and classification. Both machine learning-based (Multinomial Naïve Bayes and Support Vector Machines) and lexicon-based approaches (NRC Emotion Lexicon and VADER) were applied to construct multi-dimensional emotion indices for the target companies affect trends (Liu, 2012; Patil, Wangikar, & Jayamalini, 2017). The full pseudocode, based on Krouska, Troussas, & Virvou, (2016), detailing this procedure is provided in Appendix – A1.
First module included dataset selection – 1.6 million tweets from Senti140 were used to train MNB and SVM models, acknowledging the pseudo-label bias inherent in the dataset (Go et al., 2009; Pedregosa et al., 2011; Krouska et al., 2016; Soni et al., 2021; Skwirowski & Zytkowicz, 2023). Second module – data cleaning – noise reduction involved removing Twitter-specific tokens (#, @, $), repeated letters, URLs, and stopwords, normalizing case, stemming, and handling negations (Agarwal et al., 2011; Effrosynidis et al., 2017). Third, text vectorization – Bag of Words (BoW) and TF-IDF representations were compared using accuracy metrics. Fourth, classification—final sentiment labelling was performed for company-specific tweets.
Naïve Bayes (NB) and Support Vector Machines (SVM) are widely used for short-text classification due to their efficiency and robustness (Bermingham & Smeaton, 2010). NB estimates the probability of a tweet belonging to a specific class, while SVM identifies the optimal separating hyperplane. Both methods operate in high-dimensional feature spaces, where careful feature engineering and parameter optimisation are crucial for performance (Chen et al., 2009; Sindhu et al., 2021).
In parallel, a lexicon-based approach using NRC Emotion Lexicon and VADER captured discrete emotions beyond simple polarity. NRC allowed time-series tracking of emotions derived from Ekman's theory, while VADER was optimised for short social media texts. Domain adaptation and lexicon expansion were applied to account for financial market language (Nascimento & Ferreira, 2019).
Despite these methods, challenges persist, including sarcasm, implicit sentiment, and domain-specific jargon, which can impair classification accuracy (Magliani et al., 2016). Addressing these limitations requires advanced context-aware models and domain-adaptive strategies.
The short-term dynamics between Twitter sentiment and stock market variables were modelled using a Vector Autoregressive (VAR) framework, complemented by Granger causality and impulse response analysis (Appendix F1). The lags of the VAR model were selected based on the information criteria AIC, BIC, and HQC (Appendix A2–A5). The stationarity of the variables was assessed simultaneously using the ADF and KPSS tests. If the variables were found to be non-stationary, their first differences were applied, followed by a subsequent verification of their stationarity.
To capture nonlinear and long-term dependencies, Long Short-Term Memory (LSTM) networks were employed (Keras, n.d.), which are particularly effective for sequential data with extended memory, such as evolving Twitter discussions (Appendix A3).
Data for Apple and Amazon was collected from 01/01/2016 to 31/12/2017 with cashtag $AAPL and $AMZN. This period was selected because Twitter data was publicly accessible via the API, ensuring completeness and consistency of the dataset. Although the study covers 2016–2017, the methodology is fully replicable and can be applied to other time periods or companies, providing a generalizable framework for examining the relationship between social media activity and stock market dynamics.
During this period, Apple received a total of 808,218 tweets, while Amazon had 405,758 messages. Both companies demonstrated high coefficients of variation indicating significant fluctuations in emotional expression across the dataset (Table A1).
The percentage distribution of emotions and sentiment across the total volume of tweets was examined using three approaches: the NRC lexicon (Ekman labels), the VADER lexicon, and a machine learning model.
Using the NRC lexicon, for Apple, 4.99% of messages were classified as negative, with 1.37% expressing anger, 1.19% sadness, and 1.94% fear, while 10.60% were positive. For Amazon, 4.46% of messages were negative, including 1.08% anger, 0.91% sadness, and 1.69% fear, while 11.55% were positive. These results suggest that both companies had a higher share of positive sentiment compared to negative sentiment, with relatively few negative emotions expressed.
Applying the VADER lexicon, for Apple, 32.67% of the messages were classified as positive, 15.74% as negative, and 51.59% as neutral. For Amazon, 42.94% were positive, 14.21% negative, and 44.44% neutral. This indicates that both companies had a large proportion of neutral tweets, while Amazon showed a notably higher share of positive sentiment compared to Apple.
Finally, using the machine learning approach, for Apple, 63.85% of the messages were classified as positive and 36.15% as negative. Similarly, for Amazon, 66.64% of the messages were positive, while 33.36% were negative.
These findings underscore that both companies exhibited a majority of positive sentiment, with Amazon slightly leading in this category.
The feature space defined on the Setiment140 dataset using Bag-of-Words model in the n-gram (1, 2) variant was selected for analysis. Ten-fold cross-validation confirmed the stability of the results across folds, yielding an accuracy of approximately 0.7.
However, several concerns arise, including topic instability, information leakage between companies due to the use of multiple cashtags, and the presence of spam – all of which may reduce the predictive accuracy of Twitter-based models. Considering these challenges, as well as the instability of the VAR model over the entire sample period, the analysis was divided into semi-annual intervals.
Figures 1–2-illustrate the time series for (a) Apple Volume and $aapl Twitter Volume and (b) Amazon Volume and $amzn Twitter Volume, along with their respective close prices. Figure 2 suggests the presence of spurious correlation due to trends in the closing prices. Consequently, a trend decomposition was performed for the close prices (Apple and Amazon) using a linear model estimated via Ordinary Least Squares (OLS) for a quadratic trend. The Pearson linear correlation coefficient was then calculated between the residuals of this model and the standardised sentiment indicators. The correlations were found to be statistically insignificant, leading to the decision to refrain from modelling a linear relationship. However, based on the analysis of the residuals from the trend model for close prices and volume, there is some indication that changes in price and Twitter volume might exhibit certain common patterns during specific periods. In light of these observations, only the Long Short-Term Memory (LSTM) model was computed.

Time series for Twitter volume and volume of
Source: own preparation

Time series for Twitter volume and close price of
Source: own preparation
In financial time series, trends may represent a fundamental market characteristic, which reflect long-term movements driven by different economic factors and market dynamics. Including detrended residuals in analysis effects in investigation of short-term fluctuations and potential non-linear correlation that are independent of market's overarching trajectory. Thus, LSTM include residuals as variable.
Due to the significant volatility observed in both Twitter and stock market trading volumes, the entire research period (2016–2017) was segmented into semi-annual intervals, with interdependencies assessed across various configurations, including half-yearly and full two-year periods. This methodological decision was informed by the suboptimal statistical performance of the models estimated over the full sample, particularly the presence of autocorrelation and the instability of impulse response functions.
Table 1 presents the results of the Pearson correlation analysis for the volume and sentiment variables. The findings provide evidence that Twitter activity, as measured by various sentiment indicators, is statistically significantly correlated with stock trading volume. Indicators of fear, anger, as well as both positive and negative sentiment, exhibit significant correlations for Amazon and Apple across different time periods. This supports the hypothesis one that there is a statistically significant relationship between Twitter activity and the stock market, when considering trading volume as an outcome variable. In the process of selecting variables for the VAR model and Granger causality analysis, the Pearson correlation was employed as a criterion, given that both methods assess linear relationships between variables. In contrast, for the LSTM model, which captures nonlinear dynamics, all variables were potentially included without prior exclusion based on the results of the correlation analysis.
Pearson linear correlation coefficients for sentiment across different research periods (variable: volume)
| Indicator | AAPL | AMZN | AAPL | AMZN | AAPL | AMZN |
|---|---|---|---|---|---|---|
| full period | first half of 2016 | first half of 2017 | ||||
| critical value (5%, two tail) | 0,086 | 0,086 | 0,172 | 0,173 | 0,174 | 0,174 |
| Fear | 0,22 | 0,38 | 0,00 | 0,44 | 0,30 | 0,54 |
| Anger | 0,21 | 0,00 | 0,11 | 0,05 | 0,30 | 0,15 |
| negative VADER | 0,57 | 0,50 | 0,38 | 0,46 | 0,50 | 0,71 |
| negative NRC | 0,49 | 0,51 | 0,31 | 0,65 | 0,40 | 0,73 |
| neutral VADER | 0,56 | 0,57 | 0,29 | 0,41 | 0,50 | 0,77 |
| positive NRC | 0,40 | 0,30 | 0,26 | 0,29 | 0,40 | 0,59 |
| positive VADER | 0,56 | 0,44 | 0,36 | 0,46 | 0,60 | 0,70 |
| Sadness | 0,29 | 0,40 | 0,37 | 0,53 | 0,30 | 0,52 |
| Volume | 0,54 | 0,52 | 0,36 | 0,46 | 0,50 | 0,74 |
| Machine learning_positive | 0,53 | 0,48 | 0,14 | 0,38 | 0,40 | 0,72 |
| Machine learning_negative | 0,52 | 0,55 | 0,16 | 0,53 | 0,30 | 0,74 |
Source: own preparation
Tables 2–5 present the results of the analysis conducted using VAR models and LSTM neural networks. The performance of both approaches was evaluated based on the Mean Absolute Error (MAE) calculated on the test dataset. The analysis was carried out in three temporal variants to account for the autocorrelation issue inherent in the VAR model and the short-lived nature of topics trending on Twitter.
Results of the Granger causality analysis for Apple trading volume variable
| Granger causality | ||||
|---|---|---|---|---|
| Time period | Model | Twitter variable | Twitter -> Volume | Volume -> Twitter |
| full period | VAR(5) | Twitter volume | * | * |
| VAR(6) | negative VADER | * | ||
| VAR(5) | positive VADER | * | ||
| first half of 2016 | VAR(2) | negative VADER | * | |
| VAR(2) | Twitter volume | * | ||
| VAR(2) | Fear | * | ||
| VAR(3) | negative NRC | * | ||
| VAR (3) | positive NRC | * | ||
| VAR(2) | neutral VADER | * | ||
| VAR(2) | positive VADER | * | ||
| VAR(2) | positive ML | * | ||
| VAR(2) | negative ML | * | ||
| first half of 2017 | VAR(2) | negative VADER | * | * |
| VAR(1) | negative NRC | * | ||
| VAR(4) | neutral VADER | * | * | |
| VAR(1) | Anger | * | ||
| VAR(1) | positive VADER | * | ||
| VAR(2) | negative ML | * | * | |
| VAR(2) | positive ML | * | * | |
Source: own preparation
Results of the Granger causality analysis for the Amazon trading volume variable
| Granger causality | ||||
|---|---|---|---|---|
| Time period | Model | Twitter variable | Twitter -> Volume | Volume -> Twitter |
| Full period | VAR(2) | neutral VADER | * | |
| VAR(3) | negative ML | * | * | |
| First half of 2016 | VAR(2) | Twitter Volume | * | |
| VAR(2) | Negative VADER | * | ||
| VAR(2) | Fear | * | ||
| VAR(2) | positive NRC | * | ||
| VAR(2) | neutral VADER | * | ||
| VAR(2) | positive VADER | * | ||
| VAR(4) | Anger | * | ||
| VAR(2) | negative ML | * | ||
| VAR(2) | negative ML | * | ||
| First half of 2017 | VAR(3) | Anger | * | * |
Source: own preparation
MAE results for the Apple and Amazon volume from VAR and LSTM approaches (test dataset)
| Volume Apple | Volume Amazon | |||
|---|---|---|---|---|
| first half of 2016 | ||||
| VAR | LSTM | VAR | LSTM | |
| basic model | 52084210,079 | 30415580,280 | ||
| Fear | 6475000 | 43234472,137 | 1710000 | 29134831,193 |
| Anger | 44926837,853 | 2333900 | 27106026,803 | |
| negative VADER | 48940743,521 | 1302400 | 28507242,885 | |
| negative NRC | 11669000 | 47438402,462 | 27761003,405 | |
| neutral VADER | 7935700 | 47392150,783 | 110190 | 31538788,185 |
| positive NRC | 5272400 | 46575716,841 | 1771500 | 30197920,410 |
| positive VADER | 46123079,248 | 219890 | 31073486,591 | |
| Sadness | 45630371,911 | 27609196,773 | ||
| Twitter volume | 7894500 | 45452760,479 | 2821900 | 29480840,315 |
| ML positive | 5065700 | 48447840,465 | 201980 | 28161706,284 |
| ML negative | 6480200 | 45159658,721 | 748240 | 29443364,882 |
| first half of 2017 | ||||
| basic model | 29959620,193 | 19816644,131 | ||
| Fear | 30032320,014 | 19610157,993 | ||
| Anger | 27606000 | 29989363,986 | 9409900 | 19193604,890 |
| negative VADER | 29061000 | 29164579,531 | 19828186,097 | |
| negative NRC | 26228000 | 29378243,531 | 19651018,103 | |
| Neutral VADER | 28722000 | 29543515,048 | 19624056,441 | |
| Positive NRC | 27437000 | 29435291,269 | 19873546,559 | |
| positive VADER | 29880000 | 29613845,434 | 19708110,731 | |
| Sadness | 29202756,331 | 19887700,586 | ||
| Twitter volume | 27795000 | 29200650,234 | 19666939,683 | |
| ML positive | 28905000 | 29288276,166 | 19862915,138 | |
| ML negative | 29480424,662 | 20186129,690 | ||
| full period | ||||
| basic model | 37060274,722 | 49244981,060 | ||
| Fear | 36283377,928 | 22495202,131 | ||
| Anger | 37418912,444 | 22924401,013 | ||
| negative VADER | 37068676,291 | 22832473,868 | ||
| negative NRC | 37277158,450 | 22121621,013 | ||
| Neutral VADER | 36841175,338 | 8603500 | 23183559,170 | |
| Positive NRC | 37187736,091 | 22731837,658 | ||
| positive VADER | 6703500 | 38351343,147 | 22735923,064 | |
| Sadness | 36132565,891 | 22154059,560 | ||
| Twitter volume | 36132565,891 | 22725973,268 | ||
| ML positive | 36672397,669 | 22310880,027 | ||
| ML negative | 35741492,741 | 9708500 | 22808249,226 | |
Source: own preparation
MAE results for Apple and Amazon closing prices with LSTM approaches (test dataset)
| Full period | First half of 2016 | First half of 2017 | ||||
|---|---|---|---|---|---|---|
| Indicators | AAPL | AMZN | AAPL | AMZN | AAPL | AMZN |
| Base model | 131839949 | 179258627 | 176138673 | 274137855 | 173356700 | 252350198 |
| Fear | 125483563 | 180665696 | 164580851 | 246518980 | 211031605 | 302579474 |
| Anger | 125998230 | 188602498 | 172433139 | 249338907 | 171247523 | 273984994 |
| negative VADER | 128457127 | 183451424 | 180822334 | 279399172 | 176222412 | 229997950 |
| negative NRC | 129982302 | 181365827 | 173876262 | 256298833 | 174086104 | 208780619 |
| Neutral VADER | 126443868 | 186080492 | 180506883 | 256098881 | 239225325 | 256152057 |
| Positive NRC | 132827363 | 180629995 | 176500689 | 271267606 | 193072633 | 188347131 |
| positive VADER | 133481260 | 183951456 | 190247008 | 241919926 | 149474564 | 199168532 |
| Sadness | 133854686 | 181299519 | 186525792 | 259533810 | 176089131 | 209214699 |
| Twitter volume | 128999042 | 186141427 | 204244861 | 263932412 | 166108240 | 25818537 |
| ML_positive | 124848073 | 180946633 | 184088680 | 255719195 | 169194595 | 18691250 |
| ML_negative | 127055620 | 178378199 | 181395180 | 230176491 | 1847151434 | 195771334 |
Source: own preparation
The clustering of discussions around specific company related events generated short-term trends. However, changes in discussion dynamics did not consistently correlate with stock price movements, shifts in discussion sentiment, or external events affecting the company. During periods of declining discussion activity following upward trends, conversations often shifted toward general market commentary or spam. Such content frequently included multiple cashtags, resulting in information spillover across different companies.
In the VAR analysis, the full study period exhibited excessive volatility. The models also lacked robustness in their properties, even after introducing additional exogenous variables such as return rates. Granger causality analysis for different study periods primarily shows a unidirectional influence from Twitter on the stock market, with some exceptions. The emotion of sadness did not demonstrate any causal relationship with trading volume in any of the analysed periods. Locally, anger and fear show a unidirectional influence on Twitter. However, identifying consistent patterns across indicator groups is challenging, suggesting that the impact on trading volumes varies depending on influencing factors, leading to different dynamics within the systems.
It should be noted that the causation was primarily observed for indicators associated with a higher volume of tweets (negative and positive sentiment). Indicators of emotions from the NRC lexicon, such as sadness, anger, and fear, demonstrated weaker predictive properties (in Apple's case).
The sadness indicator did not show any predictive capability in the linear analysis for any period. Amazon exhibits weaker linear relationships compared to Apple (in terms of causation). Theoretically, greater discussion on Twitter and a more visible presence on social media should lead to higher dependencies. Apple was mentioned twice as often as Amazon in the performed dataset.
Additionally, Michalak (2024a) demonstrated that the choice of sentiment analysis method affects the final causality results. This is evident in the significance of positive versus negative sentiment, depending on the sentiment analysis method used.
Low MAE values for the ‘fear’ and ‘anger’ labels indicate that these variables significantly impact volume forecasts, particularly for Amazon (Tables 5–6). Although commonly perceived as negative emotions, from a psychological standpoint, they are considered fundamental emotions. According to evolutionary psychology, these emotions have developed as part of adaptive behavioural responses, essential for survival. They increase focus and trigger fight-or-flight responses to stimuli. These are powerful emotions with the potential to influence market trends if experienced by a large enough group of people.
Apple exhibits greater variability in MAE results, suggesting that discussions about Apple are more diverse in terms of topics, tone, and data quality. The ‘neutral VADER’ and ‘positive NRC’ variables show higher MAE compared to those related to negative emotions, implying that they have a smaller impact on financial markets. This conclusion aligns with the theory of information asymmetry, where negative news tends to have a more substantial influence on investment decisions.
In the first half of 2016, adding the Twitter variable to the neural network improved the MAE for all Apple indicators and for 9 out of 11 Amazon indicators. However, the results for Apple during this period were not satisfactory, suggesting that Twitter did not significantly enhance the model's ability to predict trading volume. The conclusions for Amazon are similar. Over the full period, Apple did not show satisfactory results, whereas the predictions for Amazon improved significantly.
A baseline model was not constructed for the VAR model as a benchmark; this was a decision made by the author. Comparing VAR would require benchmarking against an AR model, whose estimation results are not dependent on multiple variables. This lack of dependency was the main criterion for its exclusion, as it made the conclusions difficult and unstable to draw.
Table 6 presents the results for closing prices. The findings suggest that incorporating sentiment variables did not consistently demonstrate predictive power. However, negative emotions may be critical for improving forecasts, with their impact varying depending on the period and the company. Negative emotions generally have a greater influence on enhancing predictions for both companies, especially over the full period and in the first half of 2016. For Apple, the inclusion of emotion-related variables does not always improve forecast accuracy, particularly in the first half of 2017. In contrast, Amazon shows more stable results.
The analysis of the impulse response function was conducted with the shock originating from Twitter and affecting trading volume. The trajectory of the function allows us to draw the main conclusion that the duration of impulses is short, lasting a few days. This may correspond with the typical duration of discussions on a given topic. The decline in volume following a shock from Twitter is a complex issue. However, the potential reasons for such a directional relationship may include negative perceptions of information and the activation of withholding patterns. The shock could lead to increased caution, resulting in a wait-and-see approach for new information. Nevertheless, the decline in volume is a short-term effect, lasting about 1–2 days. It does not lead to long-term changes and reflects the natural dynamics of the market. The stability of the results indicates that the market tends to absorb the shock and return to normal levels of activity.
The results suggest that incorporating variables representing emotions could have statistically significant implications for improving forecasting accuracy. However, there are no clear patterns indicating which specific variables would be significant. This underscores the necessity of continuous monitoring of social media platforms. The factors influencing the variability of statistical significance over time depend on numerous elements, including those related to the immediate and broader environment of the enterprise, particularly the importance of discussions regarding events within that environment. These findings are consistent with the existing literature. Researchers have demonstrated a connection between stock markets and social media, yet identifying this relationship must occur across multiple dimensions and is often challenging.
VAR class models appear to be better suited for modelling stock trading volume than closing prices. Closing prices and investment returns are categories that are difficult to explain through a linear relationship with the volume of emotions. It is recommended that this relationship be systematised using nonlinear models, such as LSTM or GARCH class models. Consequently, the use of simple models, such as linear regression, in establishing this connection is not recommended, although such approaches are evident in the literature. Issues with time series non-stationarity, long memory processes, and the phenomenon of variance clustering exclude regression as a suitable method.
