Introduction
Built NLP classification models to predict helpfulness of Amazon product reviews using PySpark on Hadoop cluster in a team.
On this page
Amazon Review Analysis & Application
Project Objective:
To help amazon optimize and improve its overall ecommerce experience through the scope of customer and seller
Project Approach:
Helpful reviews result in better sales performance, less return, and a higher conversion rate.
Predict helpful reviews (0,1) before the review receives any traction and customers' votes on its helpfulness (classification).
In return, Amazon can better understand how to rank their reviews, and sellers can identify helpful reviews and use them to improve selling strategies and products.
Dataset:
Amazon Product Dataset: Amazon products across major categories with customer reviews, star ratings and helpful votes.
Original Variables
- marketplace
- customer_id
- review_id
- product_id
- product_parent
- product_title
- product_category
- star_rating
- helpful_votes
- total_votes
- vine
- verified_purchase
- review_headline
- review_body
- review_date
Used Variables
- review_headline
- review_id
- review_body
- customer_id
- product_id
- star_rating
- helpful_votes
- total_votes