Introduction

Help consumers and sellers have baseline for reasonable price​ points for cars​ by predicting the used car price.

Used Car Price Prediction

Project Objective:

Business Problem:

Asymmetric information, also known as “information failure,” takes place during a transaction where one party has greater material knowledge or better information than the other party.​

“The Market of Lemons: Quality Uncertainty and the Market Mechanism,” by George Akerlof

Consumers face incomplete​ information on used cars. The imbalanced information between sellers and buyers give more power to sellers manipulating the prices and giving buyers uncomfortable buying position. ​

Our Solution:

  • Help consumers stay informed on what features determine car price​
  • Help consumers and sellers have baseline for reasonable price​ points for cars​
  • Better information leads to better decisions on major purchases like cars​

Project Approach:

Project Flow:

We used dataset scraped from Craiglist and complimented the dataset with MSRP dataset. MSRP is a manufacture’s suggested listing price. Initialy intention was to look into price depreciation per car and how beginning and selling prices differ in EDA.

Caveat of using an existing dataset was it lacked recent years of dataset, so we used Scrapy, web-scraping tool, to scrape the necessary data from iSeeCars.com.

Having three datasets to combine, we used Difflib python package and used sequence matcher to match datasets by car models and manufacturers.

In A Nutshell:

As shown above, the project flow starts with data collection. Then, we perform EDA using statistical inference, clustering and topical modelling given textual columns. Then, after data preprocessing, feature engineering and feature selection, we run our selected models and improve our performance via re-engineering and hyperparameter tuning.