This study explores whether R programming can transform unstructured qualitative social media data into a quantitative format suitable for econometric modelling. It specifically examines how elements such as text, emojis, and sentiment from Reddit and X (formerly Twitter) can be converted into variables for regression analysis. With the aim to enhance the predictive power of traditional financial models using alternative data sources, the paper outlines comprehensive guidelines with specific technical steps, from scripting an API to extracting data from Reddit and X, through cleaning and tokenising to incorporating the data into regression models using R programming. The study addresses the growing need in financial economics to incorporate alternative data streams by offering a structured, replicable process for transforming high-volume, unstructured online content into statistically valid variables, thereby bridging the gap between qualitative market sentiment and quantitative modelling.
Focusing on the methodology and R scripts, this research adopts a quantitative approach, transforming qualitative social media data into a format suitable for multiple linear and instrumental variable regression models to assess the effect of social media signals on asset prices, with GameStop (GME) and Best Buy (BBY) as case studies. The process ensures reproducibility and includes open-source code, enhancing transparency and applicability for both academic and professional financial data analysis contexts.
The findings demonstrate that qualitative social media data can be quantified for financial analysis. It was effectively extracted, cleaned, and used for regression analysis. Results show that traditional market indicators fail to explain GME’s price shifts, while the frequency of rocket emojis (interpreted as speculative sentiment) was statistically significant. BBY’s returns, however, aligned more closely with market and industry indices, suggesting a lower influence of private sentiment.
The research provides a replicable method for integrating social media data into econometric models, contributing new tools for analysing market sentiment. By adapting classical financial models to modern data sources, the paper opens new directions for asset pricing research. The paper provides technical tools created in R for use in econometric analysis, useful both for academics and practitioners.
© 2025 Alexey Litvinenko, Saarinen Samuli, Anna Litvinenko, published by University College of Economics and Culture
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.