Twitter has become an important communication channel in times of emergency.
The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time.
Because of this, more agencies are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies).
Total approach towards the project can be seen on kaggle
- Machine Learning approach : https://www.kaggle.com/mohitnirgulkar/disaster-tweets-classification-using-ml
- Deep Learning approach : https://www.kaggle.com/mohitnirgulkar/disaster-tweets-classification-using-deep-learning
- Exploratory Data Analysis
- EDA after Data Cleaning
- Data Preprocessing using NLP
- Machine Learning models for classifying Tweets data
- Deep Learning approach for classifying Tweets data
- Model Deployment
- Packages : Pandas, Numpy, Matplotlib, Plotly, Word-cloud, Tensorflow, Scikit-Learn, Keras, Keras-tuner, Nltk etc.
- Dataset : https://www.kaggle.com/c/nlp-getting-started
- Word Embeddings : https://www.kaggle.com/danielwillgeorge/glove6b100dtxt
Visualising Target Variable of the Dataset
Visualising Length of Tweets
Visualising Average word lengths of Tweets
Visualising most common stop words in the text data
Visualising most common punctuations in the text data
We use Python Regex library and nltk lemmatizing methods for Data Cleaning
Visualising words inside Real Disaster Tweets
Visualising words inside Fake Disaster Tweets
Visualising top 10 N-grams where N is 1,2,3
Data Preprocessing for ML models is done using two approaches
- Bag of Words using CountVectorizer
- Term Frequency and Inverse Document Frequency using TfidfVectorizer
Data Preprocessing for DL models using Tokenization
Machine Learning Models using different n-grams and both Bow and Tf-Idf and visualisation comparing there accuracy
The Best ML model trained as we can see above is Voting Classifer, whose report and confusion matrix is shown below
- Using Glove Word Embeddings of embedding dimension = 100 to get embedding matrix for our DL models
- For every DL model we create a function and use Keras-Tuner to tune the model
- Finally we choose Bidirectional LSTM for the Deployment
- Bidirectinal LSTM model obtained from Deep Learning approach is used for deployment
- Micro Web Framework Flask is used to create web app
- Heroku is used to deploy the our Web-app on https://disastertweetsdl.herokuapp.com/
- Deep Learning Web app working
- We can always use large dataset which covers almost every type of data for both machine learning and deep learning
- We can use the best pretrained models but they require a lot of computational power
- Also there are various ways to increase model accuracy like k-fold cross validation, different data preprocessing techniques better than used here
The Data analysis and modelling was sucessfully done, and the Deep Learning model was deployed on Heroku
Please do ⭐ the repository, if it helped you in anyway.