Dataset: IMDB data
Analysis of sentiment delivered by movie reviews
BERT and LSTM model that is trained on an IMDB movie review dataset, taken from Kaggle, to classify the review as positive or negative. The results should be returned to the user displaying the prediction as either positive or negative.
Text preprocessing is traditionally an important step for natural language processing (NLP) tasks. It transforms text into a more digestible form so that machine learning algorithms can perform better. In this problem following types of preprocessing is used.
- Removing all urls from data.
- Removing all tags from data
- Decontracting the words
- Removing special character from data
- Removing stop words
- Stemming and Lemmatization
- Tf-idf vectorization
2 deep learning methods are used to train the model
- BERT (Bidirectional Encoder Representations from Transformers)
- LSTM (Long short-term memory)
- Creating a flask app for better user interaction with the model
- Deployment of the model using Heroku.