matanask/hotelsnew
收藏Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/matanask/hotelsnew
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
pretty_name: "Hotel Booking Cancellation Analysis"
tags:
- tabular
- exploratory-data-analysis
- pandas
license: "cc-by-4.0"
task_categories:
- tabular-classification
configs:
- config_name: default
data_files:
- split: train
path: data.csv
---
# Hotel Booking Cancellation Analysis
## Overview
This project analyzes hotel booking data with the goal of understanding customer behavior, especially cancellations.
The main question behind this analysis is simple:
**What drives customers to cancel their bookings?**
By exploring patterns in the data, we can better understand how timing, pricing, and customer type affect cancellation behavior.
The dataset was taken from Kaggle ("Hotel Booking Demand Dataset") and was cleaned and analyzed using Python.
---
## 🎥 Project Presentation Video
<video src="https://huggingface.co/datasets/matanask/hotelsnew/blob/main/video.mp4" controls="controls" style="max-width: 720px;"></video>
If the video does not load properly, please use the direct link below:
👉 [Watch the video here](https://huggingface.co/datasets/matanask/hotelsnew/blob/main/video.mp4)
## 📁 Notebook
The complete workflow of this project — from data cleaning to exploratory data analysis and insights — can be found in the notebook below:
👉 [View the notebook](https://huggingface.co/datasets/matanask/hotelsnew/blob/main/hotel_booking_eda.ipynb)
## Data Cleaning
Before starting the analysis, several steps were taken to ensure the data is reliable:
- Checked for missing values across all columns
- Verified and removed duplicate rows
- Reviewed data types and general structure (`df.info()` and `df.describe()`)
- Looked at distributions to understand unusual values
Outliers were found in variables such as **lead time** and **ADR (average daily rate)**.
Instead of removing them completely, I chose to keep most of them.
These values likely represent real-world scenarios (for example, very early bookings or expensive stays).
However, for visualization purposes, extreme values were sometimes limited in order to make the graphs clearer and easier to interpret.
---
## Target Variable
The main variable analyzed in this project is:
**`is_canceled`**
- 0 = Not canceled
- 1 = Canceled
---
## Question 1: Do cancellation rates differ between city hotels and resort hotels?

City hotels show a noticeably higher cancellation rate compared to resort hotels.
This makes sense when thinking about customer behavior:
City bookings are often more flexible and sometimes made for short-term plans, which can change easily.
Resort bookings, on the other hand, are usually planned in advance (vacations), so customers are more committed.
---
## Question 2: Are bookings with a longer lead time more likely to be canceled?

The data shows that canceled bookings tend to have a higher average lead time.
In other words, customers who book far in advance are more likely to cancel later.
This is a logical pattern:
The more time passes between booking and arrival, the higher the chance that plans change.
Still, it is important to note that this is a trend — not every early booking is canceled.
---
## Additional Insight: How does cancellation rate change across lead time ranges?

Breaking lead time into groups gives a clearer picture.
Instead of looking only at averages, we can see how cancellation behaves step by step.
The pattern is clear:
As lead time increases, cancellation rate also increases.
This confirms the previous finding and adds more depth to the analysis.
---
## Question 3: Do repeated guests cancel less than first-time guests?

Repeated guests have a significantly lower cancellation rate.
This suggests that customers who are already familiar with the hotel are more confident in their booking.
First-time guests, on the other hand, may still be comparing options or uncertain about their plans.
---
## Question 4: How does average daily rate (ADR) relate to cancellation behavior?

There is a slight tendency for higher-priced bookings to be canceled more often.
This could mean that customers are more sensitive when prices are high, or that expensive bookings are more likely to be reconsidered.
The difference is not extreme, but it is noticeable.
---
## Key Insights
- Longer lead time increases the likelihood of cancellation
- Repeated guests are much more reliable than new customers
- City hotels experience higher cancellation rates than resorts
- Price (ADR) may influence customer decision-making
Overall, cancellations are not random.
They are influenced by timing, experience, and pricing.
---
## Business Recommendations
Based on the findings:
- Encourage shorter lead-time bookings when possible
- Offer benefits or incentives for returning customers
- Consider stricter cancellation policies for early bookings
- Monitor pricing strategies, especially for high-value bookings
These steps can help reduce uncertainty and improve planning.
---
## Dataset Source & License
The dataset used in this project is:
**Hotel Booking Demand Dataset (Kaggle)**
https://www.kaggle.com/datasets/jessemostipak/hotel-booking-demand
The data was used for educational purposes only.
It was cleaned, modified, and analyzed as part of an academic assignment.
No ownership is claimed over the original dataset.
---
提供机构:
matanask



