This repository contains a project that demonstrates how to perform sentiment analysis on Twitter data using Apache Spark, including data preprocessing, feature engineering, model training, and evaluation.
MIT License
Twitter Sentiment Analysis repository contains a project for performing sentiment analysis on Twitter data using Apache Spark.
Sentiment_Analysis.ipynb
: Jupyter Notebook containing the code for the sentiment analysis.Sentiment.csv
: The dataset file containing the Twitter data and sentiment labels.This project demonstrates how to use Apache Spark for sentiment analysis on Twitter data. The steps covered in the project include:
pandas
, numpy
, nltk
, pyspark
Clone the repository:
git clone https://github.com/burhanahmed1/Twitter-Sentiment-Analysis-Using-PySpark.git
cd Twitter-Sentiment-Analysis-Using-PySpark
pip install -r requirements.txt
Install the required Python libraries:
pip install pandas numpy nltk pyspark
Start Jupyter Notebook:
jupyter notebook
Open Sentiment_Analysis.ipynb
in Jupyter Notebook and run the cells to execute the project.
The project demonstrates the effectiveness of using Apache Spark for sentiment analysis on large datasets. The final model achieves good accuracy in classifying the sentiment of tweets.
0.62
0.7331581773635055
and 0.07966124788395001
respectively.0.73
0.75
Data visualization techniques such as confusion matrices are used to evaluate the performance of the sentiment classification model and scatter plots are used to visualize the distribution and relationships of features in the dataset.
Contributions are welcome! If you have any ideas, suggestions, or improvements, feel free to open an issue or submit a pull request.
This project is licensed under the MIT License.
Thanks to the open-source community for providing valuable tools and libraries.