cmotions/NL_restaurant_reviews
收藏Hugging Face2022-04-21 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/cmotions/NL_restaurant_reviews
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- nl
tags:
- text-classification
- sentiment-analysis
datasets:
- train
- test
- validation
---
## Dataset overview
This is a dataset that contains restaurant reviews gathered in 2019 using a webscraping tool in Python. Reviews on restaurant visits and restaurant features were collected for Dutch restaurants.
The dataset is formatted using the 🤗[DatasetDict](https://huggingface.co/docs/datasets/index) format and contains the following indices:
- train, 116693 records
- test, 14587 records
- validation, 14587 records
The dataset holds both information of the restaurant level as well as the review level and contains the following features:
- [restaurant_ID] > unique restaurant ID
- [restaurant_review_ID] > unique review ID
- [michelin_label] > indicator whether this restaurant was awarded one (or more) Michelin stars prior to 2020
- [score_total] > restaurant level total score
- [score_food] > restaurant level food score
- [score_service] > restaurant level service score
- [score_decor] > restaurant level decor score
- [fame_reviewer] > label for how often a reviewer has posted a restaurant review
- [reviewscore_food] > review level food score
- [reviewscore_service] > review level service score
- [reviewscore_ambiance] > review level ambiance score
- [reviewscore_waiting] > review level waiting score
- [reviewscore_value] > review level value for money score
- [reviewscore_noise] > review level noise score
- [review_text] > the full review that was written by the reviewer for this restaurant
- [review_length] > total length of the review (tokens)
## Purpose
The restaurant reviews submitted by visitor can be used to model the restaurant scores (food, ambiance etc) or used to model Michelin star holders. In [this blog series](https://medium.com/broadhorizon-cmotions/natural-language-processing-for-predictive-purposes-with-r-cb65f009c12b) we used the review texts to predict next Michelin star restaurants, using R.
提供机构:
cmotions
原始信息汇总
数据集概述
该数据集包含2019年通过Python网络爬虫工具收集的荷兰餐厅评论。这些评论涉及餐厅访问和餐厅特色。数据集采用🤗DatasetDict格式,包含以下部分:
- 训练集:116693条记录
- 测试集:14587条记录
- 验证集:14587条记录
数据集特征
数据集包含餐厅级别和评论级别的信息,具体特征如下:
restaurant_ID:唯一餐厅IDrestaurant_review_ID:唯一评论IDmichelin_label:2020年前该餐厅是否获得米其林星级score_total:餐厅总评分score_food:餐厅食品评分score_service:餐厅服务评分score_decor:餐厅装饰评分fame_reviewer:评论者发布评论的频率reviewscore_food:评论食品评分reviewscore_service:评论服务评分reviewscore_ambiance:评论氛围评分reviewscore_waiting:评论等待评分reviewscore_value:评论性价比评分reviewscore_noise:评论噪音评分review_text:评论全文review_length:评论长度(词数)
数据集用途
该数据集可用于模型化餐厅评分(如食品、氛围等)或预测米其林星级餐厅。在此博客系列中,我们使用评论文本预测下一个米其林星级餐厅,使用R语言。



