Pre-modelling

Pre-modelling

Initial Outlier handling & filling in null values​

Odometer: ​

  • Highly right skewed ​
  • Outliers are removed: Upper threshold +7* stdev​

Condition:

  • Fill nulls based on odometer​
Newer condition has low odometer.
Newer condition has low odometer.
  • Quantile <25% = Excellent​
  • Quantile 25%-50% = Good​
  • Quantile >50% = Fair​
  • NA = Fair​

Transmission: ​

  • drop NA rows (~0.6%)​

Categorical variables:

  • Fill in NA with mode​
  • For cylinders, drive, type, fuel​
  • Fill NA with the most common types based on matched model and​ manufacturer​
  • Fill the rest of NA with mode​

Created Features: ​

Color: 12 colors into binary column of “is_neutral"​

  • is_neutral (1): Black, White, Silver, Grey​
  • Is_neutral (0): Colorful​

car_age :

  • year of posting_date subtracted by year when car came out ​
  • is_vintage: car_age >50 ​
  • To account for vintage cars’ higher price due to rarity and originality ​

Then, we changed data type into numeric format​ usubg label encoder. ​