five

crawlfeeds/Booking-Hotel-Reviews-Dataset

收藏
Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/crawlfeeds/Booking-Hotel-Reviews-Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en - multilingual license: cc-by-nc-4.0 task_categories: - text-classification - text-generation - feature-extraction - zero-shot-classification - summarization pretty_name: Booking.com Hotel Reviews Dataset – 4.3K Sample size_categories: - 1K<n<10K tags: - booking - hotel-reviews - travel - hospitality - sentiment-analysis - nlp - reviews - ecommerce - llm-training - ai-training - fine-tuning - aspect-based-sentiment - opinion-mining - recommendation-systems - multilingual - travel-ai configs: - config_name: default data_files: - split: train path: Crawlfeeds_booking-italy-hotel-reviews__limit-100000_language-en_after-2025-06_reviewer_country-3-items_20260410_063921.csv --- # Booking.com Hotel Reviews Dataset – 4.3K Sample A rich, structured dataset of hotel reviews collected from Booking.com, featuring a unique split of positive and negative review text, reviewer country, stay dates, traveler tags, and hotel location data. Ideal for sentiment analysis, aspect-based opinion mining, travel AI, hospitality recommendation systems, and LLM fine-tuning on real-world review data. --- ## Dataset Overview | Field | Details | |-------|---------| | **Source** | Booking.com | | **Total Records** | 4,300+ | | **Category Focus** | Hotel Reviews, Travel, Hospitality | | **Languages** | Multilingual (language field included) | | **Formats** | CSV / JSON | | **Data Quality** | Validated and structured | | **Provider** | [Crawl Feeds](https://crawlfeeds.com) | --- ## What Makes This Dataset Valuable - **Split positive and negative review text** — unlike most review datasets that provide a single text blob, this dataset separates positive and negative feedback into distinct fields, enabling precise aspect-based sentiment analysis and fine-grained opinion mining out of the box - **Traveler tags** — structured tags per review (e.g. "Solo traveler", "Couple", "Family with kids", "Business trip") provide rich contextual labels for segmentation models and personalized recommendation systems - **Reviewer country** — 99.9% coverage on reviewer nationality enables cross-cultural sentiment analysis and geo-targeted hospitality intelligence - **Stay date vs review date** — both `stayed_at` and `reviewed_at` fields are present, enabling temporal analysis of review lag and seasonal sentiment patterns - **Hotel location data** — hotel name, address, and country fields allow geographic clustering and location-based recommendation modeling - **Multilingual coverage** — language field included, enabling multilingual NLP model training and evaluation - **Helpful count** — social validation signal available in full dataset for review quality modeling --- ## Data Fields | Field | Type | Coverage | Description | |-------|------|----------|-------------| | url | String | 100% | Direct review page URL | | hotel_name | String | 100% | Name of the reviewed hotel | | hotel_address | String | 100% | Full hotel address | | country | String | 100% | Country where the hotel is located | | average_score | Float | — | Hotel average score (full dataset) | | review_title | String | 100% | Title of the review | | reviewer_name | String | 98.3% | Name of the reviewer | | rating | Float | — | Numerical rating (full dataset) | | reviewer_country | String | 99.9% | Country of the reviewer | | negative_review_text | String | 35.9% | Reviewer's negative feedback | | positive_review_text | String | 55.8% | Reviewer's positive feedback | | helpful_count | Int | — | Number of helpful votes (full dataset) | | reviewed_at | String | 100% | Date the review was submitted | | stayed_at | String | 100% | Date of the actual stay | | tags | String | 100% | Traveler type and trip context tags | | source | String | 100% | Source platform name | | source_domain | String | 100% | Source platform domain | | language | String | 100% | Language of the review | | uniq_id | String | 100% | Unique record identifier | | scraped_at | String | 100% | Data collection timestamp | --- ## Use Cases ### Sentiment Analysis & Opinion Mining - **Aspect-based sentiment analysis** — use the positive/negative split to train models that identify sentiment by hotel aspect (cleanliness, location, service, value) - **Fine-grained opinion mining** — the dual-text structure removes the need for manual sentiment labeling, providing ready-to-use positive/negative training pairs - **Zero-shot sentiment classification** — use review titles and text for zero-shot benchmarking of LLMs on hospitality domain - **Multilingual sentiment models** — leverage the language field to build and evaluate cross-lingual sentiment classifiers ### Travel & Hospitality AI - **Hotel recommendation systems** — combine ratings, tags, reviewer country, and hotel location for collaborative and content-based filtering - **Personalized travel AI** — traveler type tags enable user segmentation for personalized hotel recommendations - **Review summarization** — train abstractive summarization models to condense positive and negative feedback into concise hotel summaries - **Chatbot training for hospitality** — use review text to build domain-specific conversational AI for hotel booking assistants ### NLP & LLM Training - **LLM fine-tuning on review data** — high-quality, real-world hospitality text for domain adaptation of language models - **Review generation models** — train models to generate realistic hotel reviews given hotel attributes and traveler type - **Text classification** — use tags and reviewer country as labels for multi-label classification tasks - **Semantic search** — build dense retrieval systems over hotel reviews for question-answering applications ### Business Intelligence & Analytics - **Seasonal sentiment analysis** — use stayed_at and reviewed_at to identify how sentiment shifts by season or event - **Geo-targeted insights** — map reviewer country vs hotel country to understand which traveler nationalities rate hotels most positively - **Review lag analysis** — measure time between stay and review submission as a signal of satisfaction - **Competitive hotel intelligence** — aggregate sentiment by hotel name, address, or country for market analysis --- ## Sample Data ```json { "hotel_name": "Grand Plaza Hotel", "hotel_address": "123 Main Street, Amsterdam", "country": "Netherlands", "review_title": "Great location, average breakfast", "reviewer_name": "Sarah M.", "reviewer_country": "United Kingdom", "positive_review_text": "The location was perfect — walking distance to all major attractions. Room was clean and staff were very friendly and helpful.", "negative_review_text": "Breakfast was disappointing for the price. Limited options and the coffee was lukewarm.", "stayed_at": "2024-11-10", "reviewed_at": "2024-11-15", "tags": "Couple, Leisure trip, Superior Double Room, Stayed 3 nights", "language": "en", "source": "Booking.com" } ``` --- ## Loading the Dataset ```python from datasets import load_dataset import pandas as pd dataset = load_dataset("crawlfeeds/Booking-Hotel-Reviews-Dataset") df = dataset["train"].to_pandas() # Reviews with both positive and negative text dual_reviews = df[ df["positive_review_text"].notna() & df["negative_review_text"].notna() ] # Filter by reviewer country uk_reviewers = df[df["reviewer_country"] == "United Kingdom"] # Filter by hotel country netherlands_hotels = df[df["country"] == "Netherlands"] # Create sentiment pairs for training sentiment_pairs = df[["positive_review_text", "negative_review_text"]].dropna() # Filter by traveler type using tags solo_travelers = df[df["tags"].str.contains("Solo traveler", na=False)] couples = df[df["tags"].str.contains("Couple", na=False)] business = df[df["tags"].str.contains("Business trip", na=False)] # Prepare text for LLM fine-tuning text_data = df[["review_title", "positive_review_text", "negative_review_text"]].dropna() ``` --- ## Data Quality Notes - **average_score and rating (0%)** — numerical scores are available in the full commercial dataset at crawlfeeds.com - **helpful_count (0%)** — available in the full dataset, useful for review quality and credibility modeling - **negative_review_text (35.9%)** — partial coverage reflects real-world Booking.com behavior where many reviewers only leave positive feedback - **positive_review_text (55.8%)** — similarly reflects organic reviewer behavior on the platform - All 100% coverage fields are complete and production-ready in this sample --- ## Full Dataset & Custom Data This is a **4,300 record sample**. The full CrawlFeeds Booking.com dataset contains: - Tens of thousands of reviews across global hotel properties - Complete average_score and rating fields - Full helpful_count data - Hotels across 50+ countries - Weekly and monthly refresh options - Custom subsets by country, hotel rating, traveler type, or language **Get the full dataset:** [crawlfeeds.com](https://crawlfeeds.com) **Request custom Booking.com data:** [crawlfeeds.com/contact](https://crawlfeeds.com/contact) --- ## Related Datasets by Crawl Feeds - [IKEA Home Decor & Furniture Dataset](https://huggingface.co/datasets/crawlfeeds/IKEA-Home-Decor-Furniture-Dataset) - [Home Depot Smart Home Dataset](https://huggingface.co/datasets/crawlfeeds/HomeDepot-Smart-Home-Dataset) - [Trustpilot Reviews Dataset – 20K Sample](https://huggingface.co/datasets/crawlfeeds/Trustpilot-Reviews-Dataset-20K-Sample) - [Walmart Reviews Dataset](https://huggingface.co/datasets/crawlfeeds/walmart-reviews-dataset) - [Medium Articles Corpus](https://huggingface.co/datasets/crawlfeeds/Medium-Articles-Corpus) - [Fox News Headlines Dataset](https://huggingface.co/datasets/crawlfeeds/Curated-Fox-News-Headlines-and-Full-Text) - Google Playstore Reviews Dataset *(coming soon)* - Airbnb Reviews Dataset *(coming soon)* - Tripadvisor Reviews Dataset *(coming soon)* --- ## License This dataset is made available under [CC BY-NC 4.0](https://creativecommons.co/licenses/by-nc/4.0/). Intended for research and non-commercial use. For commercial licensing, please contact [crawlfeeds.com/contact](https://crawlfeeds.com/contact). --- ## Citation ```bibtex @dataset{crawlfeeds_booking_reviews_2025, author = {Crawl Feeds}, title = {Booking.com Hotel Reviews Dataset}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/crawlfeeds/Booking-Hotel-Reviews-Dataset} } ``` --- *Data collected and maintained by [Crawl Feeds](https://crawlfeeds.com) — structured web data for AI, analytics, and business intelligence.*
提供机构:
crawlfeeds
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作