Paper Key : IRJ************843
Author: Hardik Jadia
Date Published: 16 Oct 2023
Abstract
This research paper presents a comprehensive investigation into sentiment analysis using a diverse array of text classification models, focusing on classifying tweets into positive, negative, and neutral categories. The study encompasses multiple stages, commencing with data collection from online sources in CSV format, followed by rigorous data preprocessing. The text data undergoes feature extraction utilizing both CountVectorizer and TF-IDF Vectorizer techniques, facilitating logistic regression and Support Vector Machine (SVM) model training. The evaluation process employs a variety of metrics, including accuracy, precision, recall, F1-score, and ROC curves, to assess model performance. Three primary sentiment classification models are evaluated: Logistic Regression with CountVectorizer (lr_cv), Logistic Regression with TF-IDF Vectorizer (lr_tfidf), and Support Vector Machine (SVM). Each model exhibits unique characteristics in terms of precision, recall, and overall accuracy. Logistic Regression with CountVectorizer (lr_cv) achieves perfect recall for positive sentiments but at the expense of precision, resulting in misclassification of neutral and negative sentiments. Logistic Regression with TF-IDF Vectorizer (lr_tfidf) outperforms lr_cv, offering a balanced trade-off between precision and recall and a robust ROC curve. SVM emerges as the top-performing model with high accuracy, balanced precision, recall, and a strong ROC curve, demonstrating its efficacy in distinguishing between positive and negative sentiments. The choice of the most suitable sentiment analysis model depends on specific objectives, offering valuable insights for various applications. These findings contribute to informed model selection in the field of sentiment analysis, aiding researchers, practitioners, and decision-makers in choosing the most appropriate approach for their specific needs.
DOI LINK : 10.56726/IRJMETS45265 https://www.doi.org/10.56726/IRJMETS45265