five

turkish-nlp-suite/vitamins-supplements-reviews

收藏
Hugging Face2024-07-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/turkish-nlp-suite/vitamins-supplements-reviews
下载链接
链接失效反馈
官方服务:
资源简介:
土耳其语情感分析数据集,包含关于补充剂和维生素产品的客户评论。数据集从Vitaminler.com网站上抓取,包含客户对产品的评论和星级评分。每个评论描述了客户对产品效果、副作用、味道和气味的体验,以及对补充剂使用频率、剂量、活性成分、品牌和其他品牌类似产品的评论。此外,评论还包括客户的健康史和补充剂如何帮助解决健康问题的指示。数据集包含1,052个产品的228K条客户评论,每个实例包含产品名称、品牌名称、客户评论文本和星级评分。数据集被分为训练集、验证集和测试集。
提供机构:
turkish-nlp-suite
原始信息汇总

数据集卡片:turkish-nlp-suite/vitamins-supplements-reviews

数据集描述

  • 数据集名称: Vitamins and Supplements Reviews Dataset
  • 领域: 电子商务,客户评论

数据集概述

土耳其客户关于补充品和维生素产品的情感分析数据集。该数据集从Vitaminler.com抓取,包含客户对维生素和补充品产品的评论和星级评分。

每个客户评论描述了客户对补充品产品的体验,包括产品的效果、副作用、味道和气味,以及关于补充品使用频率和剂量的评论,活性成分,品牌,以及其他品牌的类似产品。评论还包括客户健康历史和补充品如何帮助解决客户健康问题的指示。

考虑到数据的特性,我们的Vitamins and Supplements Reviews Dataset位于客户评论数据和医疗NLP数据的交叉点。我们希望为土耳其NLU提供一个精心编制的医疗NLP数据集。

数据集实例

数据集包括1,052种产品,262个不同品牌,244K条客户评论。在编译过程中,我们排除了包含客户姓名和影响者姓名的评论。

每个数据集实例包含:

  • 产品名称
  • 品牌名称
  • 客户评论文本
  • 星级评分

示例: json { "product_name": "Microfer Şurup 250 ml", "brand": "Ocean", "review": "Bittikçe alıyorum harika bişey kızım tadını da seviyo", "star": 5 }

数据分割

名称 训练集 验证集 测试集
Vitamins and Supplements Reviews 200866 20000 20000

引用

如果您想在自己的工作中使用此数据集,请引用以下论文:

@inproceedings{altinok-2023-diverse, title = "A Diverse Set of Freely Available Linguistic Resources for {T}urkish", author = "Altinok, Duygu", booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = jul, year = "2023", address = "Toronto, Canada", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.acl-long.768", pages = "13739--13750", abstract = "This study presents a diverse set of freely available linguistic resources for Turkish natural language processing, including corpora, pretrained models and education material. Although Turkish is spoken by a sizeable population of over 80 million people, Turkish linguistic resources for natural language processing remain scarce. In this study, we provide corpora to allow practitioners to build their own applications and pretrained models that would assist industry researchers in creating quick prototypes. The provided corpora include named entity recognition datasets of diverse genres, including Wikipedia articles and supplement products customer reviews. In addition, crawling e-commerce and movie reviews websites, we compiled several sentiment analysis datasets of different genres. Our linguistic resources for Turkish also include pretrained spaCy language models. To the best of our knowledge, our models are the first spaCy models trained for the Turkish language. Finally, we provide various types of education material, such as video tutorials and code examples, that can support the interested audience on practicing Turkish NLP. The advantages of our linguistic resources are three-fold: they are freely available, they are first of their kind, and they are easy to use in a broad range of implementations. Along with a thorough description of the resource creation process, we also explain the position of our resources in the Turkish NLP world.", }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作