Introduction

Built NLP classification models to predict helpfulness of Amazon product reviews using PySpark on Hadoop cluster in a team.

Amazon Review Analysis & Application

Project Objective:

To help amazon optimize and improve its overall ecommerce experience through the scope of customer and seller

Project Approach:

Helpful reviews result in better sales performance, less return, and a higher conversion rate.

👉

Predict helpful reviews (0,1) before the review receives any traction and customers' votes on its helpfulness (classification).

In return, Amazon can better understand how to rank their reviews, and sellers can identify helpful reviews and use them to improve selling strategies and products.

Dataset:

Amazon Product Dataset: Amazon products across major categories with customer reviews, star ratings and helpful votes.

Original Variables

marketplace
customer_id
review_id
product_id
product_parent
product_title
product_category
star_rating
helpful_votes
total_votes
vine
verified_purchase
review_headline
review_body
review_date

Used Variables

review_headline
review_id
review_body
customer_id
product_id
star_rating
helpful_votes
total_votes

Introduction

Amazon Review Analysis & Application #

Project Objective: #

Project Approach: #

Helpful reviews result in better sales performance, less return, and a higher conversion rate. ​​ #

Dataset: #

Original Variables #

Used Variables #