EleutherAI/CEBaB
收藏Hugging Face2023-08-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/EleutherAI/CEBaB
下载链接
链接失效反馈官方服务:
资源简介:
CEBaB是一个反事实餐厅评论数据集的轻量级清理和简化版本。与原始数据集的主要区别在于,`rating`列对应于Mechanical Turkers提供的中位数评分,而不是多数评分。当存在多数评分时,两者相同;但当没有多数评分时,原始数据集使用`no majority`占位符,而此版本能够为所有评论提供聚合评分。数据集包含多个特征,如`original_id`、`edit_goal`、`edit_type`、`text`、`food`、`ambiance`、`service`、`noise`、`counterfactual`和`rating`,并分为训练集、验证集和测试集。
CEBaB is a lightweight cleaned and simplified version of the counterfactual restaurant review dataset. The primary distinction between this variant and the original dataset is that the `rating` column corresponds to the median rating provided by Mechanical Turkers, rather than the majority vote. When a majority rating exists, the two versions yield identical results; however, when no majority rating is present, the original dataset uses the `no majority` placeholder, while this version provides aggregated ratings for all reviews. The dataset includes multiple features such as `original_id`, `edit_goal`, `edit_type`, `text`, `food`, `ambiance`, `service`, `noise`, `counterfactual`, and `rating`, and is split into training, validation, and test sets.
提供机构:
EleutherAI
原始信息汇总
数据集概述
数据集信息
-
特征列表:
original_id: 数据类型为int32edit_goal: 数据类型为stringedit_type: 数据类型为stringtext: 数据类型为stringfood: 数据类型为stringambiance: 数据类型为stringservice: 数据类型为stringnoise: 数据类型为stringcounterfactual: 数据类型为boolrating: 数据类型为int64
-
数据集划分:
validation: 字节数为 306529,样本数为 1673test: 字节数为 309751,样本数为 1689train: 字节数为 2282439,样本数为 11728
-
数据集大小:
- 下载大小:628886 字节
- 数据集大小:2898719 字节
任务类别
- 文本分类
语言
- 英语



