fine-tuned BERT and scikit-learn models for real-time classification of disaster-related tweets, using TensorFlow, Keras, and Transformers. .
This repository hosts the Tweet Disaster Detection system, a NLP solution designed to identify disaster-related tweets in real-time. With the explosion of social media usage, rapidly detecting potential disaster events through user-generated content is crucial for timely interventions and responses.
The project leverages several powerful libraries and tools, including:
TensorFlow and Keras: Used for implementing and fine-tuning the BERT model.
Huggingface Transformers: Provides pre-trained BERT models and utilities for tokenization, model fine-tuning, and other NLP tasks.
scikit-learn: Used for traditional machine learning tasks, including implementing the Naive Bayes model and performance evaluation metrics.
Matplotlib: Utilized for plotting learning curves, confusion matrices, and other visualizations that help in analyzing model performance.
Pandas: Facilitates data manipulation and analysis, making it easier to preprocess the tweet data and prepare it for model training.
In the vast sea of tweets generated every second, our system stands out by efficiently distinguishing between tweets that indicate real disasters and those that don't. Leveraging cutting-edge machine learning algorithms and deep learning models, our approach ensures high precision and accuracy in disaster detection.
Our primary model is a fine-tuned version of BERT (Bidirectional Encoder Representations from Transformers), a state-of-the-art transformer model originally developed by Google. BERT's ability to understand context and disambiguate meaning in text makes it particularly suited for this task.
Preprocessing:
Model Architecture:
input_word_ids = Input(shape=(self.max_seq_length,), dtype=tf.int32, name='input_word_ids')
input_mask = Input(shape=(self.max_seq_length,), dtype=tf.int32, name='input_mask')
segment_ids = Input(shape=(self.max_seq_length,), dtype=tf.int32, name='segment_ids')
pooled_output, sequence_output = self.bert_layer([input_word_ids, input_mask, segment_ids])
clf_output = sequence_output[:, 0, :]
out = Dense(1, activation='sigmoid')(clf_output)
model = Model(inputs=[input_word_ids, input_mask, segment_ids], outputs=out)
Training Strategy:
0.0001
and momentum of 0.8
, ensuring convergence and stability during the fine-tuning process. Multiple epochs are run, and key metrics like accuracy, precision, recall, and F1-score are tracked to monitor performance.Model | Precision | Recall | Accuracy | F1-Score |
---|---|---|---|---|
BERT |
86%
|
84%
|
85%
|
86%
|
Naive Bayes |
82%
|
70%
|
56%
|
75%
|
Throughout the training process, several visualizations were generated:
Our model has several real-world applications that can make a significant impact:
Preventing Accidents: By identifying tweets that signal real disasters, our system can alert first responders and relevant authorities, potentially preventing accidents or minimizing damage.
Early Warning Systems: The model can provide early warnings of disasters, giving people time to prepare or evacuate to safety.
Accurate Disaster Reporting: By filtering out false or irrelevant tweets, our system can improve the accuracy of disaster reporting, ensuring that people receive trustworthy information during crises.
The Tweet Disaster Detection system demonstrates the powerful application of modern NLP techniques in critical real-world scenarios. With its high accuracy and precision, especially using the fine-tuned BERT model, this project shows great potential in contributing to disaster management and response strategies globally.
We are committed to further refining this system and exploring its applications across different domains to make the world a safer place.
Clone the Repository:
git clone https://github.com/deepmancer/tweet-disaster-detection.git
cd tweet-disaster-detection
Install Dependencies:
pip install -r requirements.txt
Run the Jupyter Notebook:
Advanced_Data_Science_Capstone.ipynb
to explore the code and see the results.Predict Disaster Tweets: