1985 Cars Data Cleaning

1985 Cars Data Cleaning

This project focused on cleaning and normalizing a classic automotive dataset from 1985. The dataset included specifications for various cars such as horsepower, fuel type, origin, and mileage (mpg). The raw data had several inconsistencies — missing values, improper data types, and duplicate entries.

Using SQL queries in BigQuery, I performed data cleaning by:

  • Replacing ? placeholder values with NULLs
  • Casting numeric fields like horsepower and mpg into proper data types
  • Removing or deduplicating rows with conflicting entries
  • Normalizing text-based categories for consistency

After cleaning, the dataset became more reliable for future analysis. This foundational step ensures better results in future data visualization and modeling tasks.

Tools Used

  • SQL
  • Google BigQuery

Dataset Source

Provided by the Google Data Analytics Professoinal Certificate Program on Coursera. According to the course, this is data from an external source that contains historical sales data on car prices and their features in the year 1985.