five

cmotions/NL_restaurant_reviews

收藏
Hugging Face2022-04-21 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/cmotions/NL_restaurant_reviews
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - nl tags: - text-classification - sentiment-analysis datasets: - train - test - validation --- ## Dataset overview This is a dataset that contains restaurant reviews gathered in 2019 using a webscraping tool in Python. Reviews on restaurant visits and restaurant features were collected for Dutch restaurants. The dataset is formatted using the 🤗[DatasetDict](https://huggingface.co/docs/datasets/index) format and contains the following indices: - train, 116693 records - test, 14587 records - validation, 14587 records The dataset holds both information of the restaurant level as well as the review level and contains the following features: - [restaurant_ID] > unique restaurant ID - [restaurant_review_ID] > unique review ID - [michelin_label] > indicator whether this restaurant was awarded one (or more) Michelin stars prior to 2020 - [score_total] > restaurant level total score - [score_food] > restaurant level food score - [score_service] > restaurant level service score - [score_decor] > restaurant level decor score - [fame_reviewer] > label for how often a reviewer has posted a restaurant review - [reviewscore_food] > review level food score - [reviewscore_service] > review level service score - [reviewscore_ambiance] > review level ambiance score - [reviewscore_waiting] > review level waiting score - [reviewscore_value] > review level value for money score - [reviewscore_noise] > review level noise score - [review_text] > the full review that was written by the reviewer for this restaurant - [review_length] > total length of the review (tokens) ## Purpose The restaurant reviews submitted by visitor can be used to model the restaurant scores (food, ambiance etc) or used to model Michelin star holders. In [this blog series](https://medium.com/broadhorizon-cmotions/natural-language-processing-for-predictive-purposes-with-r-cb65f009c12b) we used the review texts to predict next Michelin star restaurants, using R.
提供机构:
cmotions
原始信息汇总

数据集概述

该数据集包含2019年通过Python网络爬虫工具收集的荷兰餐厅评论。这些评论涉及餐厅访问和餐厅特色。数据集采用🤗DatasetDict格式,包含以下部分:

  • 训练集:116693条记录
  • 测试集:14587条记录
  • 验证集:14587条记录

数据集特征

数据集包含餐厅级别和评论级别的信息,具体特征如下:

  • restaurant_ID:唯一餐厅ID
  • restaurant_review_ID:唯一评论ID
  • michelin_label:2020年前该餐厅是否获得米其林星级
  • score_total:餐厅总评分
  • score_food:餐厅食品评分
  • score_service:餐厅服务评分
  • score_decor:餐厅装饰评分
  • fame_reviewer:评论者发布评论的频率
  • reviewscore_food:评论食品评分
  • reviewscore_service:评论服务评分
  • reviewscore_ambiance:评论氛围评分
  • reviewscore_waiting:评论等待评分
  • reviewscore_value:评论性价比评分
  • reviewscore_noise:评论噪音评分
  • review_text:评论全文
  • review_length:评论长度(词数)

数据集用途

该数据集可用于模型化餐厅评分(如食品、氛围等)或预测米其林星级餐厅。在此博客系列中,我们使用评论文本预测下一个米其林星级餐厅,使用R语言。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作