Sentiment analysis (SA) is an application of natural language processing (NLP), where individual data is certainly extracted from the text data [1, 2]. Nowadays, people share their opinions on social media with each other, worldwide. The main objective of SA is to analyze the sentiments of users based on text data on social media, which is obtained from tweets that are in the form of unstructured data [3,4,5]. The text-based SA for social media is performed according to three categories: sentence-level, aspect-level, and document-level categorization. Sentence-level classification determines, for a given sentence with several words, whether the specific text falls into a positive, negative, or neutral category [6,7,8]. Aspect-level analysis is determined by analyzing the texts based on various aspects and classifying them into positive, neutral, and negative sentiments, respectively [9]. Lastly, document-level SA is that which analyzes massive textual data and sentences in the document of a user rather than delivering the whole basic aspect of the entity.
Nowadays, for analyzing the sentiments expressed by people in tweets, artificial intelligence techniques are widely used to categorize the sentiments by learning the context information about the texts. Initially, machine learning (ML) methods were utilized for analysis of sentiments in tweets, but these methods failed to provide better classification and accurate results. Due to a lack of relevant features, the model has difficulty in differentiating the sentiments efficiently [10–11]. Thus, deep learning (DL), which is an ML-based method, is used to classify the sentiments in the texts accurately by extracting the relevant features. In DL methods, the neural network with several layers can learn complex representations of sentiments, which helps to analyze the sentiments effectively [12–13]. However, existing DL methods struggle to analyze and categorize the positive, negative, and neutral classes of emotions precisely from tweets, due to a lack of relevant information about these emotion classes; moreover, similar characteristics between the positive, negative, and neutral classes lead to misclassification. To overcome this limitation, a reductive bias-based gated recurrent unit (RD-GRU) is proposed to analyze- the sentiments in Twitter data by differentiating the positive, negative, and neutral classes efficiently [14–15]. The main contributions of this research are given below.
Figure 1 illustrates SA as a NLP methodology employed to discern emotions and opinions from textual data, particularly from social media platforms such as Twitter. It functions at three tiers: sentence-level, aspect-level, and document-level, each concentrating on distinct granularities of sentiment classification. Initially, ML techniques were employed but encountered challenges with accuracy owing to inadequate feature extraction. DL methodologies, especially neural networks such as gated recurrent units (GRU), enhance performance by discerning intricate patterns in textual data. Nonetheless, DL models encounter difficulties in differentiating across analogous sentiment categories. A unique RD-GRU model is developed, including a reduction bias mechanism into GRU to prioritize sentiment-laden words (e.g., adjectives, adverbs, negations). This improves the model's capacity to precisely categorize attitudes as good, negative, or neutral in tweets.
GRU captures long-term dependencies and temporal patterns in tweet sequences effectively, enabling the model to understand sentiment flow over time.
The reductive bias technique is incorporated with GRU's update gate to emphasize sentiment-rich components such as adjectives, adverbs, or negations.
The remaining part of this research is organized as follows: Section II explains the literature review. Section III describes the methodologies implemented for this research. Section IV illustrates the experimental results. Section V concludes the paper.

Methodology of reductive bias in gated recurrent SA of twitter data. RD-GRU, reductive bias-based gated recurrent unit; SA, sentiment analysis.
Halawani et al. [16] developed an NLP model along with dimensionality reduction based on support vector machine, k-nearest neighbor, and Naive Bayes methods for SA for social media tweets. The developed ML models were employed individually to analyze the sentiments in the tweets from social media. However, the feature selection technique utilized in the developed SA model has poor search ability, which impacts the selection of relevant features and reduces the model's performance.
Aslan et al. [17] presented a model that utilizes DL techniques for SA of the Twitter dataset. The presented model utilized a convolutional neural network (CNN) and long short-term memory (LSTM) to analyze the sentiments from tweets. The advantage of the presented DL model is that a self-attention mechanism helps the model focus more on specific information, which enhances the classification performance of sentiments. However, the presented DL model was unable to analyze certain words due to multiple meanings for the same word, thus leading to poor analysis.
Parveen et al. [18] designed a sentiment classification model for a social media text dataset based on a hybrid gated attention network. The hybrid model was designed with the White Shark optimizer to analyze and categorize the various sentiments from tweets. The main advantage of the designed hybrid gated attention recurrent network is its use of term-weighting-based feature extraction and the White Shark optimizer, which selects features based on polarity to enhance the performance of the hybrid model. However, the designed hybrid gated attention recurrent network fails to differentiate the sentiments of positive, negative, and neutral, which leads to inaccurate results.
Vidyashree and Rajendra [19] presented an SA model based on stochastic gradient descent optimization with a stochastic gate neural network for the Twitter dataset. The represented gate neural network model, which was incorporated with the gradient descent optimization method, was utilized to analyze the sentiments effectively. The advantage of the representational gate neural network model was that stochastic gradient optimization enhanced its effectiveness. However, the represented gated neural network model failed to analyze sentiment for certain words, because those words shared similar characteristics with other sentiments.
Serpil Aslan et al. [20] designed CNN-based sentiment for the social media Twitter dataset. The SA CNN model, which incorporates an arithmetic optimization model, is designed for text classification of tweets to address existing limitations by detecting and eliminating stop words. The arithmetic optimization model fine-tuned the developed model, enhancing its detection performance. However, the designed CNN model with the arithmetic optimization model struggles to accurately classify emotions in SA due to irrelevant features.
The main objective of the proposed DL method is to analyze the sentiment/emotions present in the tweets from social media. This proposed framework involves five phases: dataset, preprocessing, feature extraction, feature selection, and classification. Figure 2 represents the block diagram for DL-based SA for social media text. At first, the tweets from social media are acquired from two Twitter datasets, Sentiment140 and SemEval, and then preprocessed by various techniques to make a useful format of the text by removing unnecessary words in tweets. After preprocessing, these transformed texts are passed to log term frequency-modified inverse class frequency (LTF-MICF) for feature extraction based on the term weighting process. Next, the extracted features are passed to the proposed feature selection process, which selects optimal features by using an optimization algorithm. Finally, with the selected features as input, the proposed DL-based classifier model analyzes and categorizes the sentiments present in the tweets, which are social media texts.

Block diagram for SA for social media texts. LTF-MICF, log term frequency-modified inverse class frequency; SA, sentiment analysis.
In this research, the social media datasets such as Sentiment140 and SemEval based on Twitter are used as input for DL-based SA.
This is a publicly available Twitter dataset that is extracted from the Twitter API and comprises 1,600,000 tweets. These tweets are categorized into three groups, such as positive, neutral, and negative, based on the score values, where negative tweets are represented by the score value “0,” neutral tweets are denoted by the score value “2,” and positive tweets are indicated by the score value “4.” According to the score values, there are 22,542 negative tweets, 18,318 neutral tweets, and 20,832 positive tweets in the dataset with six attributes. Table 1 illustrates the attributes of the Sentiment140 dataset.
The six attributes of Sentiment140 dataset
| S. no. | Attributes |
|---|---|
| 1. | Text of the tweet |
| 2. | User who tweeted |
| 3. | Whether there is a query on the tweet or not? |
| 4. | Date of tweet |
| 5. | Tweet ID |
| 6. | Polarity of the tweet |
The Twitter API dataset, a publicly available dataset, focuses on real-time applications that gather text data from Twitter. This dataset has the data, which is categorized into three categories: positive tweets, negative tweets, and neutral tweets from the user. These raw data of tweets are fed to the initial processing phase of the preprocessing stage to enhance the data by removing unwanted texts.
The collected raw text data is initially fed into the preprocessing phase to transform the texts into a useful format for enhancing the SA. This research employs preprocessing techniques such as stemming, tokenization, and stop word removal to transform the raw text sentences obtained from the dataset. The process of these techniques is explained as follows:
- ➢
URL and Hashtags removal: The tweets include several hashtags and URLs, which are helpful to the users; however, they contain noise that is not used to further the process. Thus, these hashtags and URLs are removed from texts.
- ➢
Stop words removal: After eliminating the hashtags, the tweets are passed to the stop words removal technique, which identifies and eliminates irrelevant texts in the sentence. In stop word removal, the frequently utilized stop words like “and,” “is,” “the,” and “so,” which have no sentiment, are removed from the tweets. The techniques include Z-methods, mutual information, the term-based random sampling method (TBRS), and others.
- ➢
Tokenization: The tokenization technique divides a sentence into small texts, phrases, and other meaningful forms, which are referred to as “tokens.” Moreover, this tokenization helps identify the most meaningful words that enhance accurate SA. These tokens are further passed to the stemming process.
Stemming: After tokenization, the tokens are transformed into their root words by this stemming process. In this process, the prefixes and suffixes like “un,” “dis,” “ed,” “im,” and “ing” are removed to make the text into a root word or dictionary form. During the stemming process, the two processes mentioned below were considered:
The words with different meaning should keep separately.
The words of morphological forms that have similar meaning should be mapped with the same stem words. These preprocessed texts (stem words) are forwarded to the next phase of feature extraction.
After preprocessing, the preprocessed data is fed to the feature extraction phase based on a new term weighting technique, namely, LTF-MICF. This feature extraction method combines two distinct term weighting strategies, LTF and MICF, which extract the features based on frequently occurring terms in the sentence or tokens, referred to as term frequency. The frequently occurring terms in the sentence/tokens provide insufficient term frequency. Therefore, this hybrid feature extraction technique is employed to address this issue by integrating term frequency with the MICF technique. The inverse class frequency is determined as the inverse ratio of the entire class of terms that occurs during training the tweets to the total classes. The LTF-MICF is mathematically formulated as given in Eq. (1):
Where, wsp notes the weighting factor (WF) for every class; tp presents the preprocessed terms; and the specific WF is expressed mathematically as in Eq. (2):
Where, si notes the number of tweets in a class;
The extracted features from the hybrid method are further passed to the principal component analysis (PCA) technique-based feature selection phase. This PCA technique is extensively used for unsupervised learning algorithms in ML techniques for dimensional reduction. Similarly, in this research, PCA is employed to minimize the dimensions of the extracted feature, the high-dimensional feature vector set, to a new low-dimensional vector set and also selects the most relevant features. The PCA method normalizes the data first, then constructs a covariance matrix by acquiring eigenvectors; thereafter, representative eigenvectors are identified. This PCA technique assumes that most of the feature information in the three classes includes significant variations. The process of reducing dimensional reduction of the feature vector using PCA is mentioned below:
- 1)
Estimate the mean value.
- 2)
Identify the covariance matrix of extracted features.
- 3)
Determine the covariance matrix to acquire eigenvalues and eigenvector, then arrange it based on descending order. Then, obtain p dimensional feature subspace (k ≤ M) by calculating p largest eigenvalues and equivalent eigenvector.
After dimensionality reduction and feature selection by PCA, the selected features are forwarded as input to the proposed classification method for SA of social media texts into positive, negative, and neutral. For efficient classification of sentiments, a DL-based method is proposed in this research, namely, the GRU, which is a version of the recurrent neural network (RNN) model. The GRU model allows the prediction results of past contextual information of the words and also makes a sequence operation of features, which is more stable compared with individually processing each selected feature. The proposed GRU model includes two gates: an update gate that combines the input, and the forget gates found in LSTM. A function of the update gate is to choose the memory information that makes the model continue to be retained until the present stage. The reset gate controls how much past information is fed to the network. The mathematical formulations of these two gating states, based on the hidden layer state and the input of the present node, are presented in Eqs (3) and (4):
Where, σ notes sigmoid function; xt dicates input text; zt and rt represent update and reset gates; ht−1 ands for hidden layer output; Wxz and Wxr note weights of update and rest gates; and br and bz represent the reset and update gates.
The information flow extracted from the selected features in the network is leveraged by the two gates in GRU. The above-mentioned update and reset gates are used to determine information which are to be retained and those that are to be forgotten, also alleviating the gradient descending problem. However, the GRU has certain limitations like overfitting, long-term dependencies, and give less importance to frequently occurring patterns in the dataset that affects to differentiate sentiments efficiently that lead to misclassification.
In DL-based algorithms, residual is known as the “Residual term or error”, which is a difference among the observed and analyzed or predicted values that are created by the model. In a linear regression, the residual term at every data point is estimated by subtracting the observed value of the dependent variable from the analyzed value, which is determined as corresponding to the independent variables and the GRU model parameters. The mathematical representation of reductive bias rt time stamp t expressed in Eq. (5):
Where,
In terms of bias (b̂) corresponding to overall evaluation, is represented in Eq. (7):
The incorporation of this reductive bias is in the hidden state of the GRU model that enhances the rest gate and update gate of the GRU with this bias term. This enhancement utilizes only the information that is most relevant from the selected features to that sentiment. Utilizing the proposed reductive bias technique addresses the issue of the GRU model and improves its ability to differentiate between the positive, negative, and neutral classes effectively.
The performance evaluation of the proposed DL-based classifier for SA of social media text data, which includes the Sentiment140 and Twitter API datasets, is presented here. The performance of the proposed classifier and feature extraction methods is analyzed with approaches used in SA, as illustrated in Tables 2–5 and Figures 2 and 3. The proposed SA framework employs a variety of performance measures to assess the effectiveness of its methods. We use performance metrics like accuracy, precision, recall, and F1-score. The mathematical representation of these performance measures is expressed in Eqs (8)–(11):
Performance analysis of the proposed method for Sentiment140 dataset
| Methods | Accuracy (%) | Precision (%) | Recall (%) | F1-score (%) |
|---|---|---|---|---|
| CNN | 91.11 | 92.93 | 89.96 | 91.42 |
| LSTM | 93.79 | 94.41 | 91.40 | 92.88 |
| RNN | 95.36 | 96.95 | 93.15 | 95.01 |
| GRU | 96.74 | 97.42 | 95.63 | 96.52 |
| Proposed RD-GRU Method | 98.92 | 98.49 | 97.54 | 98.01 |
CNN, convolutional neural network; GRU, gated recurrent unit; LSTM, long short-term memory; RD-GRU, reductive bias-based gated recurrent unit; RNN, recurrent neural network.
Performance analysis of feature extraction method for Sentiment140 dataset
| Methods | Accuracy (%) | Precision (%) | Recall (%) | F1-score (%) |
|---|---|---|---|---|
| Word2Vec | 91.09 | 92.48 | 93.04 | 92.76 |
| Glove | 94.13 | 95.69 | 95.80 | 95.74 |
| Skip gram | 96.33 | 96.29 | 96.36 | 96.32 |
| LTF-MICF | 98.92 | 98.49 | 97.54 | 98.01 |
LTF-MICF, log term frequency-based modified inverse class frequency.
Comparative analysis of the proposed method for Sentiment140 dataset
| Methods | Accuracy (%) | Precision (%) | Recall (%) | F1-score (%) |
|---|---|---|---|---|
| SVM [17] | 84.25 | 85.83 | 86.37 | 86.13 |
| CNN-LSTM [18] | 97.86 | 96.65 | 96.76 | 96.70 |
| IBAO [21] | 98.73 | 97.56 | 96.46 | 95.89 |
| Proposed RD-GRU | 98.92 | 98.49 | 97.54 | 98.01 |
CNN-LSTM, convolutional neural network-long short-term memory; RD-GRU, reductive bias-based gated recurrent unit.
Comparative analysis of the proposed method for Twitter API dataset
| Methods | Accuracy (%) | Precision (%) | Recall (%) | F1-score (%) |
|---|---|---|---|---|
| TSA-CNN-AOA (KNN) | 95.09 | 94.23 | 93.23 | 93.71 |
| Proposed RD-GRU | 97.84 | 97.37 | 96.95 | 97.15 |
RD-GRU, reductive bias-based gated recurrent unit.

Performance analysis of the proposed method for Twitter API dataset. API, CNN, convolutional neural network; GRU, gated recurrent unit; RD-GRU, reductive bias-based gated recurrent unit; RNN, recurrent neural network.
Where, TP True Positive, FP False Positive, TN True Negative, and FN False Negative.
The quantitative and qualitative analysis of the proposed DL-based classification of emotions utilized for SA of social media texts using two benchmark datasets are depicted in this section. Existing DL approaches include CNN, LSTM, RNN, and GRU. Tables 2 and 3 and Figures 1 and 2 represent the performance analysis of feature extraction and the proposed classification models.
Comparative analysis evaluates the proposed method used to analyze sentiments in texts obtained from two benchmark Twitter datasets. Existing SA approaches are utilized to evaluate the proposed approach for comparison. Performance metrics such as accuracy, precision, recall, and F1-score are utilized for comparative analysis of the proposed method. Tables 4 and 5 represent the comparative analysis of the proposed method for Sentiment140 dataset and Twitter API datasets, respectively.
The proposed approach achieved better results with precise classification of sentiments or emotions of users from the Amazon review dataset. SA of user reviews by various ML and DL models has a problem in classifying emotions accurately due to a lack of informative features from the review dataset, which is accomplished by the presented RD-GRU method. IBAO Improves Feature Quality by selecting only the most relevant features; it reduces noise and enhances the model's ability to learn meaningful patterns and boosts classification accuracy. Studies indicate that models using IBAO outperform traditional GRU and LSTM models in sentiment classification tasks. SVM, the feature selection technique utilized in the developed SA model has poor search ability, which impacts selection of the relevant features and reduces the model's performance. CNN-LSTM was unable to analyze certain words due to multiple meanings for the same word, thus leading to poor analysis. The CNN model with arithmetic optimization model SA based on the developed model struggles to classify emotions accurately because of irrelevant features. To solve this problem, an IBAO approach is proposed to effectively enhance sentiment classification in the Twitter dataset [21]. However, RD-GRU captures long-term dependencies and temporal patterns in tweet sequences effectively, enabling the model to understand sentiment flow over time [22]. The incorporation of this reductive bias in the hidden state of the GRU model enhances the reset gate and update gate of the GRU with this bias term [23]. This enhancement utilizes only the information that is most relevant from the selected features to that sentiment.
Figure 3 and Figure 4 illustrate the performance of the RD-GRU model an RD-GRU model for the SA of Twitter data. In contrast to conventional ML and DL models, which frequently encounter difficulties in accurately classifying attitudes due to feature overlap and insufficient contextual focus, RD-GRU incorporates a reductive bias mechanism inside the GRU framework. This approach improves the model's capacity to concentrate on sentiment-laden elements such as adjectives, adverbs, and negations by altering the update gate. Consequently, RD-GRU enhances the distinction among positive, negative, and neutral sentiment categories, resulting in more accurate classification. The model adeptly captures long-term dependencies and temporal trends in tweet sequences, providing a more profound comprehension of sentiment progression across time. This focused improvement renders RD-GRU a unique and more efficient method for SA in chaotic, unstructured social media data.

Performance analysis of feature extraction method for Twitter API dataset. API, LTF-MICF, log term frequency-based modified inverse class frequency.
The RD-GRU approach is proposed to enhance classification of sentiments in the Twitter dataset effectively. Initially, the data from Sentiment140 and the Twitter API are preprocessed by four different techniques, and then features are extracted by the LTF and MICF techniques. The features are selected based on PCA and the proposed IBAO algorithm efficiently. Finally, classification of user reviews is performed by the SA-Bi-LSTM approach that categorizes the reviews as positive, negative, and neutral classes. The results of the proposed model for SA are evaluated by the performance metrics, which attained an accuracy of 98.92% and 97.84% for the Sentiment140 dataset and Twitter API datasets, and are superior to existing models such as CNN and LSTM approaches. In the future, transformer-based models will be used to enhance the classification of sentiments in the social media text for various Twitter datasets.