ruanchaves/b2w-reviews01
收藏Hugging Face2023-01-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/ruanchaves/b2w-reviews01
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- found
language:
- pt
language_creators:
- found
license:
- cc-by-4.0
multilinguality:
- monolingual
pretty_name: B2W-Reviews01
size_categories:
- 100M<n<1B
source_datasets:
- original
tags:
- reviews
task_categories:
- text-classification
task_ids:
- sentiment-analysis
- sentiment-scoring
- intent-classification
- topic-classification
---
# Dataset Card for Dataset Name
## Dataset Description
- **Repository:** https://github.com/americanas-tech/b2w-reviews01
- **Paper:** http://comissoes.sbc.org.br/ce-pln/stil2019/proceedings-stil-2019-Final-Publicacao.pdf
- **Point of Contact:** Livy Real
### Dataset Summary
B2W-Reviews01 is an open corpus of product reviews. It contains more than 130k e-commerce customer reviews, collected from the Americanas.com website between January and May, 2018. B2W-Reviews01 offers rich information about the reviewer profile, such as gender, age, and geographical location. The corpus also has two different review rates:
* the usual 5-point scale rate, represented by stars in most e-commerce websites,
* a "recommend to a friend" label, a "yes or no" question representing the willingness of the customer to recommend the product to someone else.
### Supported Tasks and Leaderboards
* Sentiment Analysis
* Topic Modeling
### Languages
* Portuguese
## Dataset Structure
### Data Instances
```
{'submission_date': '2018-01-02 06:23:22',
'reviewer_id': '6adc7901926fc1697d34181fbd88895976b4f3f31f0102d90217d248a1fad156',
'product_id': '123911277',
'product_name': 'Triciclo Gangorra Belfix Cabeça Cachorro Rosa',
'product_brand': 'belfix',
'site_category_lv1': 'Brinquedos',
'site_category_lv2': 'Mini Veículos',
'review_title': 'O produto não foi entregue',
'overall_rating': 1,
'recommend_to_a_friend': 'Yes',
'review_text': 'Incrível o descaso com o consumidor. O produto não chegou, apesar de já ter sido pago. Não recebo qualquer informação sobre onde se encontra o produto, ou qualquer compensação do vendedor. Não recomendo.',
'reviewer_birth_year': 1981,
'reviewer_gender': 'M',
'reviewer_state': 'RJ'}
```
### Data Fields
* **submission_date**: the date and time when the review was submitted. `"%Y-%m-%d %H:%M:%S"`.
* **reviewer_id**: a unique identifier for the reviewer.
* **product_id**: a unique identifier for the product being reviewed.
* **product_name**: the name of the product being reviewed.
* **product_brand**: the brand of the product being reviewed.
* **site_category_lv1**: the highest level category for the product on the site where the review is being submitted.
* **site_category_lv2**: the second level category for the product on the site where the review is being submitted.
* **review_title**: the title of the review.
* **overall_rating**: the overall star rating given by the reviewer on a scale of 1 to 5.
* **recommend_to_a_friend**: whether or not the reviewer would recommend the product to a friend (Yes/No).
* **review_text**: the full text of the review.
* **reviewer_birth_year**: the birth year of the reviewer.
* **reviewer_gender**: the gender of the reviewer (F/M).
* **reviewer_state**: the Brazilian state of the reviewer (e.g. RJ).
### Data Splits
| name |train|
|---------|----:|
|b2w-reviews01|132373|
### Citation Information
```
@inproceedings{real2019b2w,
title={B2W-reviews01: an open product reviews corpus},
author={Real, Livy and Oshiro, Marcio and Mafra, Alexandre},
booktitle={STIL-Symposium in Information and Human Language Technology},
year={2019}
}
```
### Contributions
Thanks to [@ruanchaves](https://github.com/ruanchaves) for adding this dataset.
提供机构:
ruanchaves
原始信息汇总
数据集概述
基本信息
- 数据集名称: B2W-Reviews01
- 语言: 葡萄牙语
- 许可证: CC-BY-4.0
- 数据集大小: 100M<n<1B
- 数据来源: 原始数据
- 标签: 评论
- 任务类别: 文本分类
- 任务ID:
- 情感分析
- 情感评分
- 意图分类
- 主题分类
数据集描述
- 数据集摘要: B2W-Reviews01是一个包含超过130k电子商务客户评论的开源语料库,收集自2018年1月至5月的Americanas.com网站。该语料库提供了关于评论者档案的丰富信息,如性别、年龄和地理位置。此外,还包含两种不同的评论评分:
- 通常的5点量表评分,以大多数电子商务网站上的星星表示。
- “推荐给朋友”标签,一个“是或否”的问题,代表客户推荐产品的意愿。
数据集结构
- 数据实例: 示例数据包含提交日期、评论者ID、产品ID、产品名称、产品品牌、网站类别、评论标题、总体评分、推荐给朋友、评论文本、评论者出生年份、评论者性别和评论者所在州等信息。
- 数据字段:
- submission_date: 提交评论的日期和时间。
- reviewer_id: 评论者的唯一标识符。
- product_id: 被评论产品的唯一标识符。
- product_name: 被评论产品的名称。
- product_brand: 被评论产品的品牌。
- site_category_lv1: 产品在提交评论网站的最高级别类别。
- site_category_lv2: 产品在提交评论网站的第二级别类别。
- review_title: 评论的标题。
- overall_rating: 评论者给出的总体星级评分,范围从1到5。
- recommend_to_a_friend: 评论者是否推荐该产品给朋友(是/否)。
- review_text: 评论的全文。
- reviewer_birth_year: 评论者的出生年份。
- reviewer_gender: 评论者的性别(F/M)。
- reviewer_state: 评论者所在的巴西州。
数据分割
- 数据分割:
名称 训练 b2w-reviews01 132373
引用信息
@inproceedings{real2019b2w, title={B2W-reviews01: an open product reviews corpus}, author={Real, Livy and Oshiro, Marcio and Mafra, Alexandre}, booktitle={STIL-Symposium in Information and Human Language Technology}, year={2019} }
搜集汇总
数据集介绍

背景与挑战
背景概述
B2W-Reviews01是一个包含超过13万条葡萄牙语产品评论的开放语料库,收集自2018年1月至5月的Americanas.com电商网站。该数据集提供了丰富的评论者人口统计信息(如性别、年龄和地理位置)以及两种评分方式(5星评分和是否推荐给朋友的二元标签),适用于情感分析、主题建模等自然语言处理任务。
以上内容由遇见数据集搜集并总结生成



