turkish-nlp-suite/vitamins-supplements-reviews
收藏数据集卡片:turkish-nlp-suite/vitamins-supplements-reviews
数据集描述
- 数据集名称: Vitamins and Supplements Reviews Dataset
- 领域: 电子商务,客户评论
数据集概述
土耳其客户关于补充品和维生素产品的情感分析数据集。该数据集从Vitaminler.com抓取,包含客户对维生素和补充品产品的评论和星级评分。
每个客户评论描述了客户对补充品产品的体验,包括产品的效果、副作用、味道和气味,以及关于补充品使用频率和剂量的评论,活性成分,品牌,以及其他品牌的类似产品。评论还包括客户健康历史和补充品如何帮助解决客户健康问题的指示。
考虑到数据的特性,我们的Vitamins and Supplements Reviews Dataset位于客户评论数据和医疗NLP数据的交叉点。我们希望为土耳其NLU提供一个精心编制的医疗NLP数据集。
数据集实例
数据集包括1,052种产品,262个不同品牌,244K条客户评论。在编译过程中,我们排除了包含客户姓名和影响者姓名的评论。
每个数据集实例包含:
- 产品名称
- 品牌名称
- 客户评论文本
- 星级评分
示例: json { "product_name": "Microfer Şurup 250 ml", "brand": "Ocean", "review": "Bittikçe alıyorum harika bişey kızım tadını da seviyo", "star": 5 }
数据分割
| 名称 | 训练集 | 验证集 | 测试集 |
|---|---|---|---|
| Vitamins and Supplements Reviews | 200866 | 20000 | 20000 |
引用
如果您想在自己的工作中使用此数据集,请引用以下论文:
@inproceedings{altinok-2023-diverse, title = "A Diverse Set of Freely Available Linguistic Resources for {T}urkish", author = "Altinok, Duygu", booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = jul, year = "2023", address = "Toronto, Canada", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.acl-long.768", pages = "13739--13750", abstract = "This study presents a diverse set of freely available linguistic resources for Turkish natural language processing, including corpora, pretrained models and education material. Although Turkish is spoken by a sizeable population of over 80 million people, Turkish linguistic resources for natural language processing remain scarce. In this study, we provide corpora to allow practitioners to build their own applications and pretrained models that would assist industry researchers in creating quick prototypes. The provided corpora include named entity recognition datasets of diverse genres, including Wikipedia articles and supplement products customer reviews. In addition, crawling e-commerce and movie reviews websites, we compiled several sentiment analysis datasets of different genres. Our linguistic resources for Turkish also include pretrained spaCy language models. To the best of our knowledge, our models are the first spaCy models trained for the Turkish language. Finally, we provide various types of education material, such as video tutorials and code examples, that can support the interested audience on practicing Turkish NLP. The advantages of our linguistic resources are three-fold: they are freely available, they are first of their kind, and they are easy to use in a broad range of implementations. Along with a thorough description of the resource creation process, we also explain the position of our resources in the Turkish NLP world.", }



