
Hotel Bookings Data Cleaning
This project demonstrates how I cleaned a simulated hotel bookings dataset using R within RStudio/Posit Cloud, as part of the Google Data Analytics Capstone course.
Dataset Source
Provided by the Google Data Analytics Professoinal Certificate Program on Coursera.
Note: According to the course scenario, the dataset was compiled from two hotel systems and exported as a
.csv
file. The data contained inconsistencies such as null values, misnamed columns, and type mismatches. It was intended for data cleaning practice using R and RStudio. This entire course assignment was hosted in Posit cloud, a cloud-based version of RStudio. However, as of July 2025, Posit Cloud deprecated publishing for our projects, and is no longer accessible. I will republish this using the desktop version of RStudio and push this as a repo on GitHub in the future.
Tools & Packages Used
- 📦
tidyverse
– data wrangling and manipulation - 📦
skimr
– quick data overviews - 📦
janitor
– cleaning and renaming column names
Key Cleaning Tasks
- Imported
.csv
file usingread_csv()
- Previewed structure with
head()
,str()
,glimpse()
, andskim_without_charts()
- Selected and renamed relevant columns for clarity (
hotel
,lead_time
, etc.) - Created derived fields (e.g., total guests per booking)
- Combined year and month columns for date analysis
- Summarized basic metrics (e.g., number of cancellations, average lead time)
While this project focused primarily on data cleaning and preparation, the cleaned dataset is ready for deeper analysis — such as:
- Booking seasonality
- Cancellation trends
- Hotel occupancy patterns