crawlfeeds/Booking-Hotel-Reviews-Dataset
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/crawlfeeds/Booking-Hotel-Reviews-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
- multilingual
license: cc-by-nc-4.0
task_categories:
- text-classification
- text-generation
- feature-extraction
- zero-shot-classification
- summarization
pretty_name: Booking.com Hotel Reviews Dataset – 4.3K Sample
size_categories:
- 1K<n<10K
tags:
- booking
- hotel-reviews
- travel
- hospitality
- sentiment-analysis
- nlp
- reviews
- ecommerce
- llm-training
- ai-training
- fine-tuning
- aspect-based-sentiment
- opinion-mining
- recommendation-systems
- multilingual
- travel-ai
configs:
- config_name: default
data_files:
- split: train
path: Crawlfeeds_booking-italy-hotel-reviews__limit-100000_language-en_after-2025-06_reviewer_country-3-items_20260410_063921.csv
---
# Booking.com Hotel Reviews Dataset – 4.3K Sample
A rich, structured dataset of hotel reviews collected from Booking.com, featuring a unique split of positive and negative review text, reviewer country, stay dates, traveler tags, and hotel location data. Ideal for sentiment analysis, aspect-based opinion mining, travel AI, hospitality recommendation systems, and LLM fine-tuning on real-world review data.
---
## Dataset Overview
| Field | Details |
|-------|---------|
| **Source** | Booking.com |
| **Total Records** | 4,300+ |
| **Category Focus** | Hotel Reviews, Travel, Hospitality |
| **Languages** | Multilingual (language field included) |
| **Formats** | CSV / JSON |
| **Data Quality** | Validated and structured |
| **Provider** | [Crawl Feeds](https://crawlfeeds.com) |
---
## What Makes This Dataset Valuable
- **Split positive and negative review text** — unlike most review datasets that provide a single text blob, this dataset separates positive and negative feedback into distinct fields, enabling precise aspect-based sentiment analysis and fine-grained opinion mining out of the box
- **Traveler tags** — structured tags per review (e.g. "Solo traveler", "Couple", "Family with kids", "Business trip") provide rich contextual labels for segmentation models and personalized recommendation systems
- **Reviewer country** — 99.9% coverage on reviewer nationality enables cross-cultural sentiment analysis and geo-targeted hospitality intelligence
- **Stay date vs review date** — both `stayed_at` and `reviewed_at` fields are present, enabling temporal analysis of review lag and seasonal sentiment patterns
- **Hotel location data** — hotel name, address, and country fields allow geographic clustering and location-based recommendation modeling
- **Multilingual coverage** — language field included, enabling multilingual NLP model training and evaluation
- **Helpful count** — social validation signal available in full dataset for review quality modeling
---
## Data Fields
| Field | Type | Coverage | Description |
|-------|------|----------|-------------|
| url | String | 100% | Direct review page URL |
| hotel_name | String | 100% | Name of the reviewed hotel |
| hotel_address | String | 100% | Full hotel address |
| country | String | 100% | Country where the hotel is located |
| average_score | Float | — | Hotel average score (full dataset) |
| review_title | String | 100% | Title of the review |
| reviewer_name | String | 98.3% | Name of the reviewer |
| rating | Float | — | Numerical rating (full dataset) |
| reviewer_country | String | 99.9% | Country of the reviewer |
| negative_review_text | String | 35.9% | Reviewer's negative feedback |
| positive_review_text | String | 55.8% | Reviewer's positive feedback |
| helpful_count | Int | — | Number of helpful votes (full dataset) |
| reviewed_at | String | 100% | Date the review was submitted |
| stayed_at | String | 100% | Date of the actual stay |
| tags | String | 100% | Traveler type and trip context tags |
| source | String | 100% | Source platform name |
| source_domain | String | 100% | Source platform domain |
| language | String | 100% | Language of the review |
| uniq_id | String | 100% | Unique record identifier |
| scraped_at | String | 100% | Data collection timestamp |
---
## Use Cases
### Sentiment Analysis & Opinion Mining
- **Aspect-based sentiment analysis** — use the positive/negative split to train models that identify sentiment by hotel aspect (cleanliness, location, service, value)
- **Fine-grained opinion mining** — the dual-text structure removes the need for manual sentiment labeling, providing ready-to-use positive/negative training pairs
- **Zero-shot sentiment classification** — use review titles and text for zero-shot benchmarking of LLMs on hospitality domain
- **Multilingual sentiment models** — leverage the language field to build and evaluate cross-lingual sentiment classifiers
### Travel & Hospitality AI
- **Hotel recommendation systems** — combine ratings, tags, reviewer country, and hotel location for collaborative and content-based filtering
- **Personalized travel AI** — traveler type tags enable user segmentation for personalized hotel recommendations
- **Review summarization** — train abstractive summarization models to condense positive and negative feedback into concise hotel summaries
- **Chatbot training for hospitality** — use review text to build domain-specific conversational AI for hotel booking assistants
### NLP & LLM Training
- **LLM fine-tuning on review data** — high-quality, real-world hospitality text for domain adaptation of language models
- **Review generation models** — train models to generate realistic hotel reviews given hotel attributes and traveler type
- **Text classification** — use tags and reviewer country as labels for multi-label classification tasks
- **Semantic search** — build dense retrieval systems over hotel reviews for question-answering applications
### Business Intelligence & Analytics
- **Seasonal sentiment analysis** — use stayed_at and reviewed_at to identify how sentiment shifts by season or event
- **Geo-targeted insights** — map reviewer country vs hotel country to understand which traveler nationalities rate hotels most positively
- **Review lag analysis** — measure time between stay and review submission as a signal of satisfaction
- **Competitive hotel intelligence** — aggregate sentiment by hotel name, address, or country for market analysis
---
## Sample Data
```json
{
"hotel_name": "Grand Plaza Hotel",
"hotel_address": "123 Main Street, Amsterdam",
"country": "Netherlands",
"review_title": "Great location, average breakfast",
"reviewer_name": "Sarah M.",
"reviewer_country": "United Kingdom",
"positive_review_text": "The location was perfect — walking distance to all major attractions. Room was clean and staff were very friendly and helpful.",
"negative_review_text": "Breakfast was disappointing for the price. Limited options and the coffee was lukewarm.",
"stayed_at": "2024-11-10",
"reviewed_at": "2024-11-15",
"tags": "Couple, Leisure trip, Superior Double Room, Stayed 3 nights",
"language": "en",
"source": "Booking.com"
}
```
---
## Loading the Dataset
```python
from datasets import load_dataset
import pandas as pd
dataset = load_dataset("crawlfeeds/Booking-Hotel-Reviews-Dataset")
df = dataset["train"].to_pandas()
# Reviews with both positive and negative text
dual_reviews = df[
df["positive_review_text"].notna() &
df["negative_review_text"].notna()
]
# Filter by reviewer country
uk_reviewers = df[df["reviewer_country"] == "United Kingdom"]
# Filter by hotel country
netherlands_hotels = df[df["country"] == "Netherlands"]
# Create sentiment pairs for training
sentiment_pairs = df[["positive_review_text", "negative_review_text"]].dropna()
# Filter by traveler type using tags
solo_travelers = df[df["tags"].str.contains("Solo traveler", na=False)]
couples = df[df["tags"].str.contains("Couple", na=False)]
business = df[df["tags"].str.contains("Business trip", na=False)]
# Prepare text for LLM fine-tuning
text_data = df[["review_title", "positive_review_text", "negative_review_text"]].dropna()
```
---
## Data Quality Notes
- **average_score and rating (0%)** — numerical scores are available in the full commercial dataset at crawlfeeds.com
- **helpful_count (0%)** — available in the full dataset, useful for review quality and credibility modeling
- **negative_review_text (35.9%)** — partial coverage reflects real-world Booking.com behavior where many reviewers only leave positive feedback
- **positive_review_text (55.8%)** — similarly reflects organic reviewer behavior on the platform
- All 100% coverage fields are complete and production-ready in this sample
---
## Full Dataset & Custom Data
This is a **4,300 record sample**. The full CrawlFeeds Booking.com dataset contains:
- Tens of thousands of reviews across global hotel properties
- Complete average_score and rating fields
- Full helpful_count data
- Hotels across 50+ countries
- Weekly and monthly refresh options
- Custom subsets by country, hotel rating, traveler type, or language
**Get the full dataset:** [crawlfeeds.com](https://crawlfeeds.com)
**Request custom Booking.com data:** [crawlfeeds.com/contact](https://crawlfeeds.com/contact)
---
## Related Datasets by Crawl Feeds
- [IKEA Home Decor & Furniture Dataset](https://huggingface.co/datasets/crawlfeeds/IKEA-Home-Decor-Furniture-Dataset)
- [Home Depot Smart Home Dataset](https://huggingface.co/datasets/crawlfeeds/HomeDepot-Smart-Home-Dataset)
- [Trustpilot Reviews Dataset – 20K Sample](https://huggingface.co/datasets/crawlfeeds/Trustpilot-Reviews-Dataset-20K-Sample)
- [Walmart Reviews Dataset](https://huggingface.co/datasets/crawlfeeds/walmart-reviews-dataset)
- [Medium Articles Corpus](https://huggingface.co/datasets/crawlfeeds/Medium-Articles-Corpus)
- [Fox News Headlines Dataset](https://huggingface.co/datasets/crawlfeeds/Curated-Fox-News-Headlines-and-Full-Text)
- Google Playstore Reviews Dataset *(coming soon)*
- Airbnb Reviews Dataset *(coming soon)*
- Tripadvisor Reviews Dataset *(coming soon)*
---
## License
This dataset is made available under [CC BY-NC 4.0](https://creativecommons.co/licenses/by-nc/4.0/). Intended for research and non-commercial use. For commercial licensing, please contact [crawlfeeds.com/contact](https://crawlfeeds.com/contact).
---
## Citation
```bibtex
@dataset{crawlfeeds_booking_reviews_2025,
author = {Crawl Feeds},
title = {Booking.com Hotel Reviews Dataset},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/crawlfeeds/Booking-Hotel-Reviews-Dataset}
}
```
---
*Data collected and maintained by [Crawl Feeds](https://crawlfeeds.com) — structured web data for AI, analytics, and business intelligence.*
提供机构:
crawlfeeds



