allegro/klej-polemo2-out
收藏数据集概述
名称: PolEmo2.0-OUT
语言: 波兰语 (pl)
许可: Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
多语言性: 单语
规模: 1K<n<10K
来源: 原始数据
任务类别: 文本分类
任务ID: 情感分类
数据集描述
PolEmo2.0是一个包含四个领域(医药、酒店、产品和大学)的在线消费者评论数据集。数据集包含超过8000条评论,其中约85%来自医药和酒店领域。数据集中的评论和句子均由人工标注。
任务详情
输入: 句子
输出: 句子情感标签(zero: 中性, minus: 负面, plus: 正面, amb: 模糊)
度量标准: 准确度
数据分割
| 子集 | 数量 |
|---|---|
| 训练 | 5783 |
| 测试 | 722 |
| 验证 | 723 |
类别分布
| 类别 | 情感 | 训练 | 验证 | 测试 |
|---|---|---|---|---|
| minus | 正面 | 0.379 | 0.334 | 0.368 |
| plus | 负面 | 0.271 | 0.332 | 0.302 |
| amb | 模糊 | 0.182 | 0.332 | 0.328 |
| zero | 中性 | 0.168 | 0.002 | 0.002 |
引用信息
@inproceedings{kocon-etal-2019-multi, title = "Multi-Level Sentiment Analysis of {P}ol{E}mo 2.0: Extended Corpus of Multi-Domain Consumer Reviews", author = "Koco{ }, Jan and Mi{l}kowski, Piotr and Za{s}ko-Zieli{ }ska, Monika", booktitle = "Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)", month = nov, year = "2019", address = "Hong Kong, China", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/K19-1092", doi = "10.18653/v1/K19-1092", pages = "980--991", abstract = "In this article we present an extended version of PolEmo {--} a corpus of consumer reviews from 4 domains: medicine, hotels, products and school. Current version (PolEmo 2.0) contains 8,216 reviews having 57,466 sentences. Each text and sentence was manually annotated with sentiment in 2+1 scheme, which gives a total of 197,046 annotations. We obtained a high value of Positive Specific Agreement, which is 0.91 for texts and 0.88 for sentences. PolEmo 2.0 is publicly available under a Creative Commons copyright license. We explored recent deep learning approaches for the recognition of sentiment, such as Bi-directional Long Short-Term Memory (BiLSTM) and Bidirectional Encoder Representations from Transformers (BERT).", }



