- raklugrin01/Disaster-Tweets-Analysis-and-Classification: Analysing Disaster related tweets dataset and build a classifier using deep learning and deploy it using Heroku

Twitter has become an important communication channel in times of emergency.
The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time.
Because of this, more agencies are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies).

Total approach towards the project can be seen on kaggle
- Machine Learning approach : https://www.kaggle.com/mohitnirgulkar/disaster-tweets-classification-using-ml
- Deep Learning approach : https://www.kaggle.com/mohitnirgulkar/disaster-tweets-classification-using-deep-learning

Exploratory Data Analysis
EDA after Data Cleaning
Data Preprocessing using NLP
Machine Learning models for classifying Tweets data
Deep Learning approach for classifying Tweets data
Model Deployment

Packages : Pandas, Numpy, Matplotlib, Plotly, Word-cloud, Tensorflow, Scikit-Learn, Keras, Keras-tuner, Nltk etc.
Dataset : https://www.kaggle.com/c/nlp-getting-started
Word Embeddings : https://www.kaggle.com/danielwillgeorge/glove6b100dtxt

Visualising Target Variable of the Dataset
Visualising Length of Tweets
Visualising Average word lengths of Tweets
Visualising most common stop words in the text data
Visualising most common punctuations in the text data

We use Python Regex library and nltk lemmatizing methods for Data Cleaning
Visualising words inside Real Disaster Tweets
Visualising words inside Fake Disaster Tweets
Visualising top 10 N-grams where N is 1,2,3

Data Preprocessing for ML models is done using two approaches
- Bag of Words using CountVectorizer
- Term Frequency and Inverse Document Frequency using TfidfVectorizer
Data Preprocessing for DL models using Tokenization

Machine Learning Models using different n-grams and both Bow and Tf-Idf and visualisation comparing there accuracy
The Best ML model trained as we can see above is Voting Classifer, whose report and confusion matrix is shown below

Using Glove Word Embeddings of embedding dimension = 100 to get embedding matrix for our DL models
For every DL model we create a function and use Keras-Tuner to tune the model
Finally we choose Bidirectional LSTM for the Deployment

Bidirectinal LSTM model obtained from Deep Learning approach is used for deployment
Micro Web Framework Flask is used to create web app
Heroku is used to deploy the our Web-app on https://disastertweetsdl.herokuapp.com/
Deep Learning Web app working

We can always use large dataset which covers almost every type of data for both machine learning and deep learning
We can use the best pretrained models but they require a lot of computational power
Also there are various ways to increase model accuracy like k-fold cross validation, different data preprocessing techniques better than used here

The Data analysis and modelling was sucessfully done, and the Deep Learning model was deployed on Heroku

Please do ⭐ the repository, if it helped you in anyway.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
Readme_requirements		Readme_requirements
static		static
templates		templates
.slugignore		.slugignore
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
app.py		app.py
disaster-tweets-classification-using-deep-learning.ipynb		disaster-tweets-classification-using-deep-learning.ipynb
disaster-tweets-classification-using-ml.ipynb		disaster-tweets-classification-using-ml.ipynb
model_BiLSTM.h5		model_BiLSTM.h5
nltk.txt		nltk.txt
requirements.txt		requirements.txt
tokenizer.pickle		tokenizer.pickle

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

raklugrin01/Disaster-Tweets-Analysis-and-Classification

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages