five

ruanchaves/b2w-reviews01

收藏
Hugging Face2023-01-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/ruanchaves/b2w-reviews01
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - found language: - pt language_creators: - found license: - cc-by-4.0 multilinguality: - monolingual pretty_name: B2W-Reviews01 size_categories: - 100M<n<1B source_datasets: - original tags: - reviews task_categories: - text-classification task_ids: - sentiment-analysis - sentiment-scoring - intent-classification - topic-classification --- # Dataset Card for Dataset Name ## Dataset Description - **Repository:** https://github.com/americanas-tech/b2w-reviews01 - **Paper:** http://comissoes.sbc.org.br/ce-pln/stil2019/proceedings-stil-2019-Final-Publicacao.pdf - **Point of Contact:** Livy Real ### Dataset Summary B2W-Reviews01 is an open corpus of product reviews. It contains more than 130k e-commerce customer reviews, collected from the Americanas.com website between January and May, 2018. B2W-Reviews01 offers rich information about the reviewer profile, such as gender, age, and geographical location. The corpus also has two different review rates: * the usual 5-point scale rate, represented by stars in most e-commerce websites, * a "recommend to a friend" label, a "yes or no" question representing the willingness of the customer to recommend the product to someone else. ### Supported Tasks and Leaderboards * Sentiment Analysis * Topic Modeling ### Languages * Portuguese ## Dataset Structure ### Data Instances ``` {'submission_date': '2018-01-02 06:23:22', 'reviewer_id': '6adc7901926fc1697d34181fbd88895976b4f3f31f0102d90217d248a1fad156', 'product_id': '123911277', 'product_name': 'Triciclo Gangorra Belfix Cabeça Cachorro Rosa', 'product_brand': 'belfix', 'site_category_lv1': 'Brinquedos', 'site_category_lv2': 'Mini Veículos', 'review_title': 'O produto não foi entregue', 'overall_rating': 1, 'recommend_to_a_friend': 'Yes', 'review_text': 'Incrível o descaso com o consumidor. O produto não chegou, apesar de já ter sido pago. Não recebo qualquer informação sobre onde se encontra o produto, ou qualquer compensação do vendedor. Não recomendo.', 'reviewer_birth_year': 1981, 'reviewer_gender': 'M', 'reviewer_state': 'RJ'} ``` ### Data Fields * **submission_date**: the date and time when the review was submitted. `"%Y-%m-%d %H:%M:%S"`. * **reviewer_id**: a unique identifier for the reviewer. * **product_id**: a unique identifier for the product being reviewed. * **product_name**: the name of the product being reviewed. * **product_brand**: the brand of the product being reviewed. * **site_category_lv1**: the highest level category for the product on the site where the review is being submitted. * **site_category_lv2**: the second level category for the product on the site where the review is being submitted. * **review_title**: the title of the review. * **overall_rating**: the overall star rating given by the reviewer on a scale of 1 to 5. * **recommend_to_a_friend**: whether or not the reviewer would recommend the product to a friend (Yes/No). * **review_text**: the full text of the review. * **reviewer_birth_year**: the birth year of the reviewer. * **reviewer_gender**: the gender of the reviewer (F/M). * **reviewer_state**: the Brazilian state of the reviewer (e.g. RJ). ### Data Splits | name |train| |---------|----:| |b2w-reviews01|132373| ### Citation Information ``` @inproceedings{real2019b2w, title={B2W-reviews01: an open product reviews corpus}, author={Real, Livy and Oshiro, Marcio and Mafra, Alexandre}, booktitle={STIL-Symposium in Information and Human Language Technology}, year={2019} } ``` ### Contributions Thanks to [@ruanchaves](https://github.com/ruanchaves) for adding this dataset.
提供机构:
ruanchaves
原始信息汇总

数据集概述

基本信息

  • 数据集名称: B2W-Reviews01
  • 语言: 葡萄牙语
  • 许可证: CC-BY-4.0
  • 数据集大小: 100M<n<1B
  • 数据来源: 原始数据
  • 标签: 评论
  • 任务类别: 文本分类
  • 任务ID:
    • 情感分析
    • 情感评分
    • 意图分类
    • 主题分类

数据集描述

  • 数据集摘要: B2W-Reviews01是一个包含超过130k电子商务客户评论的开源语料库,收集自2018年1月至5月的Americanas.com网站。该语料库提供了关于评论者档案的丰富信息,如性别、年龄和地理位置。此外,还包含两种不同的评论评分:
    • 通常的5点量表评分,以大多数电子商务网站上的星星表示。
    • “推荐给朋友”标签,一个“是或否”的问题,代表客户推荐产品的意愿。

数据集结构

  • 数据实例: 示例数据包含提交日期、评论者ID、产品ID、产品名称、产品品牌、网站类别、评论标题、总体评分、推荐给朋友、评论文本、评论者出生年份、评论者性别和评论者所在州等信息。
  • 数据字段:
    • submission_date: 提交评论的日期和时间。
    • reviewer_id: 评论者的唯一标识符。
    • product_id: 被评论产品的唯一标识符。
    • product_name: 被评论产品的名称。
    • product_brand: 被评论产品的品牌。
    • site_category_lv1: 产品在提交评论网站的最高级别类别。
    • site_category_lv2: 产品在提交评论网站的第二级别类别。
    • review_title: 评论的标题。
    • overall_rating: 评论者给出的总体星级评分,范围从1到5。
    • recommend_to_a_friend: 评论者是否推荐该产品给朋友(是/否)。
    • review_text: 评论的全文。
    • reviewer_birth_year: 评论者的出生年份。
    • reviewer_gender: 评论者的性别(F/M)。
    • reviewer_state: 评论者所在的巴西州。

数据分割

  • 数据分割:
    名称 训练
    b2w-reviews01 132373

引用信息

@inproceedings{real2019b2w, title={B2W-reviews01: an open product reviews corpus}, author={Real, Livy and Oshiro, Marcio and Mafra, Alexandre}, booktitle={STIL-Symposium in Information and Human Language Technology}, year={2019} }

搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
B2W-Reviews01是一个包含超过13万条葡萄牙语产品评论的开放语料库,收集自2018年1月至5月的Americanas.com电商网站。该数据集提供了丰富的评论者人口统计信息(如性别、年龄和地理位置)以及两种评分方式(5星评分和是否推荐给朋友的二元标签),适用于情感分析、主题建模等自然语言处理任务。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作