Pre-modelling
Pre-modelling
Initial Outlier handling & filling in null values
Odometer:
- Highly right skewed
- Outliers are removed: Upper threshold +7* stdev
Condition:
- Fill nulls based on odometer
- Quantile <25% = Excellent
- Quantile 25%-50% = Good
- Quantile >50% = Fair
- NA = Fair
Transmission:
- drop NA rows (~0.6%)
Categorical variables:
- Fill in NA with mode
- For cylinders, drive, type, fuel
- Fill NA with the most common types based on matched model and manufacturer
- Fill the rest of NA with mode
Created Features:
Color: 12 colors into binary column of “is_neutral"
- is_neutral (1): Black, White, Silver, Grey
- Is_neutral (0): Colorful
car_age :
- year of posting_date subtracted by year when car came out
- is_vintage: car_age >50
- To account for vintage cars’ higher price due to rarity and originality
Then, we changed data type into numeric format usubg label encoder.