NLP - Sentiment Analysis of Amazon reviews

NLP - Sentiment Analysis of Amazon reviews

Sentiment Analysis of Amazon reviews for a product using NLP, Vader lexicon Sentiment Analyzer

Table of contents

No heading

No headings in the article.

Sentiment Analysis is being used in our daily lives as a platform to understand how people are connecting to a particular subject, tweet, products, etc. Using the power of NLP (Natural Language Processing) in conjunction with ML/DL/AI we can extract the reviews from a web page and analyze the sentiments attached to the reviews.

image.png

Objective: Amazon shopping has millions of products reviewed on daily basis by the users. These reviews could be positive, negative or just plain simple reviews with neutral reaction to the same. These reviews help in several ways for both the shopping portal and to the brands selling their products. These reviews which are extracted using the NLP techniques and then analyzed for the sentiments they hold. Based on these the brands can use them to improve their products or give better offers and the shopping sites can use them to improve their visibility by using the recommendations and giving offers that can lure more customers.

Business problem: Extract the reviews for a particular product ( in this case the Iphone-13) and check the sentiments for the product by the users, using different techniques. Create wordclouds of the +ve and -ve sentiments to check the words being used. Checking the frequency of words.

Methodology: We describe following roadmap as project methodology ;

  • Data Collection with Web Scraping
  • Data Cleaning and Tranformation
  • Exploratory Data Analysis (EDA)
  • Sentiment Analysis using Vader Lexicon
  • Alternate method with TFIDF extraction
  • Forming Wordclouds
  • Bigram analysis
  • Frequency distribution of bigrams
  • Bigram wordcloud

Data Collection with Web Scraping using Beautiful Soup: Extraction of review or a text portion from a website is done using a libraries called the “requests” and “Beautiful Soup”. Web scraping is the technique wherein the url is used for the particular website and the library “requests” is used to extract content from a url. Additionally “Beautiful Soup” is used for to extract the text from the html text. We need to be very careful while selecting the particular text, as a slight mistake in not selecting the right span/div/class may not give you the right results.

image.png

As seen in the code snipet above, the url is taken from Amazon.in for “Iphone-13”. The attributes are taken by accessing the html from the website. In order to extract reviews from say many pages, say 20 in our case for which range is clearly given under the “for loop”, we need to take care of copying the exact portion containing the text part only from the website. The website would have various things like reviews, text, images, stars reviewed, etc. Now we need to select complete reviews from the bottom of the reviews in the product page and go to 2nd page in order to select the multiple pages and copy the url. While pasting the url we need to delete the page number 2 from the url, in order for the code to be running the for loop for page range and extract reviews from the 20 pages mentioned. The reviews extracted are stored in object “ipr”.

Data Cleaning and transformation: The text extracted is raw and has many elements in the data that might hinder the analysis. In order to clean the data we need to remove the Stop words, unwanted text, punctuation marks, etc. The text is then split using the spaces, the words are also stemmed here for clearer picture. For this we define a function as below and invoke the same on the text.

image.png

Sentiment Analyzer using Vader Lexicon

These extracted and cleansed words are joined into the data into the text column. This is then checked for sentiment analysis using the Vader Lexicon library and utilizing the Sentiment analysis Analyzer.

image.png

This helps us extract the positive, negative and neutral sentimental reviews. These can be additionally scored and analyzed which sentiment is being used most by the users for the product.

image.png

This whole procedure helps us understand the sentiments by the users and help with the analysis. We use the Wordcloud to check the most frequent used words in the extracted text.

image.png

Alternate Work through: Although the stemmer used made extract the words which were senseless. We use the TFIDF vectorizer and extract the important and relevant words and cleanse the same using the stop word removal. Post this we use the wordcloud again and check the words again and see that the words make more sense to be analyzed.

image.png

Additionally, we can use the positive and negative words only form a wordclouds.

image.png

image.png

There have been queries whether we should use positive or negative words to analyze the text. Well, we should always check both and additionally use the bigrams to check the context of the words. It is then where we understand that sometimes a negative words might not be used for a wrong context. For eg, a word like “cheap” might be negative but would not have been used in a negative context. As in the users must have reviewed for the product to be cheap/ cheaper for the quality being provided. Hence, a bigram word analysis might help one understand the sentiments better. The link to the detailed description and code for the work above is given below for you to understand and work on it. Please feel free to leave a feedback, up the notebook and use the same for betterment.

Future work: Sentiment Analysis is an area which is vast and not limited to just the above analysis. We can use various techniques to understand the text reviews/ text mails/ call transcripts, etc. and work on various area like product enhancements, sentiment analysis in the company, etc. as it is impossible for one to read or go through hundreds or thousands of those reviews/ mails or any sources of text data available. The applications of Sentiment analysis extends to social media monitoring, customer support, feedback, brand monitoring, product analysis, market research and many more to come.

Link to Kaggle Notebook: https://www.kaggle.com/singhnproud77/sentiment-analysis-on-amazon-reviews

Connect with me on various platforms and profile below: jaspreet-singh.netlify.app