five

rotemvahava/airbnb-global-market-analysis

收藏
Hugging Face2026-04-12 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/rotemvahava/airbnb-global-market-analysis
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: Global Airbnb Market Analysis license: mit task_categories: - tabular-regression - tabular-classification tags: - real-estate - tourism - global-market - economics configs: - config_name: default data_files: - split: train path: airbnb_top_cities_final.csv --- # Global Airbnb Market Analysis: Pricing and Host Dynamics [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1GVyS4sLAOUy0mQTX8RgNNfC-NMeLx51w) <video src="https://huggingface.co/datasets/rotemvahava/airbnb-global-market-analysis/resolve/main/final_video_rotem.mp4" controls="controls" style="max-width: 720px; width: 100%;"></video> *** ## Repository Contents | File | Description | | :--- | :--- | | **`airbnb_top_cities_final.csv`** | The final, cleaned dataset used for this analysis (normalized to USD). | | **`Airbnb_Market_Analysis_EDA.ipynb`** | The full Python notebook containing all cleaning code and visualizations. | | **[Google Colab Notebook](https://colab.research.google.com/drive/1GVyS4sLAOUy0mQTX8RgNNfC-NMeLx51w)** | Direct link to the live interactive research environment. | | **`README.md`** | This document, providing the project overview and key findings. | *** ## Dataset Overview This dataset provides a comprehensive exploratory analysis of the global short term rental market, focusing on five major international hubs: New York, London, Paris, Bangkok, and Sydney. The project transforms raw Airbnb data into actionable business insights through rigorous cleaning, currency normalization, and advanced statistical visualizations. ### Research Goal The primary objective of this project is to decode the pricing mechanisms of the global short term rental market. By analyzing real world data from diverse international hubs, we aim to identify the key factors (geographic, operational, and popularity based) that allow hosts to optimize their revenue and market positioning. ### Main Research Question What are the primary drivers influencing the nightly rental price of an Airbnb listing? *** ## Data Dictionary (20 Variables) ### 1. Categorical and Geographic Variables * **city**: The global market where the listing is located. * **room_type**: The level of privacy (Entire home/apt, Private room, Hotel room, or Shared room). * **neighbourhood_group**: Broader administrative region or borough. * **neighbourhood**: Specific local district within the city. * **latitude / longitude**: Exact geographic coordinates for spatial analysis. ### 2. Numerical and Operational Metrics * **price**: The nightly rental cost, normalized to a global USD standard. * **minimum_nights**: The host's strategy regarding stay duration. * **calculated_host_listings_count**: Total properties managed by a single host. * **availability_365**: Number of days the property is open for booking annually. ### 3. Engagement and Popularity Metrics * **number_of_reviews**: Total historical review count. * **number_of_reviews_ltm**: Reviews received in the Last Twelve Months. * **reviews_per_month**: Frequency of guest turnover. * **last_review**: The date the property was most recently reviewed. ### 4. Identifiers and Metadata * **id**: Unique numerical identifier for each Airbnb listing. * **name**: Descriptive title of the listing. * **host_id**: Unique identifier for the host. * **host_name**: First name of the host or the company's name. * **license**: Official license or registration number. * **scrape_date**: The date when the data was collected. * **host_type**: Categorization of hosts as "Single Listing" or "Commercial". *** ## Data Cleaning and Preprocessing To ensure the integrity of the analysis, a rigorous cleaning pipeline was implemented: * **Missing Value Management**: Records with missing prices or critical popularity metrics were removed. * **Currency Normalization (The Bangkok Correction)**: A critical audit revealed that Bangkok prices were listed in Thai Baht (THB). A conversion factor of 0.028 was applied to align all data to USD. * **Outlier Handling**: Extremely high nightly rates were removed based on the Interquartile Range (IQR). Additionally, listings requiring a minimum stay of 365+ nights were excluded to focus strictly on the short term market. *** ## Key Research Findings & Visual Proof ### 1. The Global Footprint This map establishes our geographic baseline, showing the density of listings across the five international hubs analyzed. ![Global Density Map](https://huggingface.co/datasets/rotemvahava/airbnb-global-market-analysis/resolve/main/map.png) ### 2. The Bangkok Anomaly (Data Auditing) **The Discovery:** Initial analysis showed Bangkok as a massive luxury outlier, with prices appearing 30x higher than New York. **The Fix:** Normalizing the currency from THB to USD revealed its true position as the most affordable market in the study. ![Bangkok Anomaly](https://huggingface.co/datasets/rotemvahava/airbnb-global-market-analysis/resolve/main/bangkok_anomaly.png) ### 3. The Occupancy Hypothesis: The "Trust Threshold" **Insight:** The relationship between reviews and occupancy is non-linear and weaker than anticipated. While moving from 'Low' to 'Average' popularity provides a slight boost, the trend plateaus in 'High' tiers. **Conclusion:** Review volume is a **Trust Threshold**—it helps secure the initial market entry, but beyond a certain point, it does not drive continuous occupancy growth. ![Occupancy Barplot](https://huggingface.co/datasets/rotemvahava/airbnb-global-market-analysis/resolve/main/occupancy_barplot.png) ### 4. The Privacy Premium **Insight:** Room Type is the most significant predictor of price across all cities. This Boxen plot shows that "Entire homes" consistently command the highest market premiums regardless of location. ![Price by Room Type](https://huggingface.co/datasets/rotemvahava/airbnb-global-market-analysis/resolve/main/boxen_plot.png) ### 5. Host Professionalization **Insight:** Comparing Single Listing vs. Commercial hosts shows similar medians, but commercial hosts dominate the extreme luxury tier, managing the majority of high-priced outliers. ![Host Segment Analysis](https://huggingface.co/datasets/rotemvahava/airbnb-global-market-analysis/resolve/main/violin_plot.png) *** ## Addressing the Research Question: The Verdict Our research provides a clear answer to the initial question: **Physicality and Location dominate Social Proof.** 1. **Drivers of Price:** Nightly pricing is almost exclusively driven by **Room Type** and **City Location**. A "Social Proof" premium does not exist in a significant way; highly reviewed properties do not necessarily cost more. 2. **The Role of Social Proof:** Reviews function as a **risk-reduction tool** for guests rather than a pricing lever for hosts. They ensure a listing meets a "trust standard," but once that standard is met, they do not grant the host the power to increase prices significantly. 3. **Strategic Conclusion:** To optimize revenue, hosts should focus on property upgrades and location-based marketing rather than solely chasing high review volumes. *** ## Research Gaps & Limitations * **Review Sentiment vs. Volume**: We measured review quantity, not quality (positive vs. negative content). * **Seasonality**: The data is a static snapshot and does not account for holiday price surges or seasonal shifts. *** **Author**: Rotem Vahava **Institution**: Reichman University **Project**: Assignment 1 EDA & Dataset Creation
提供机构:
rotemvahava
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作