Alienmaster/german_politicians_twitter_sentiment
收藏Hugging Face2024-04-23 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/Alienmaster/german_politicians_twitter_sentiment
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- de
task_categories:
- text-classification
size_categories:
- 1K<n<10K
multilinguality:
- monolingual
configs:
- config_name: default
data_files:
- split: train
path: "train.parquet"
- split: test
path: "test.parquet"
---
## Information
This dataset shows 1785 manually annotated tweets from German politicians during the election year 2021 (01.01.2021 - 31.12.2021).
The tweets were annotated by 6 academics which were separated into two different groups. So every group of 3 people annotated the sentiment of ~900 tweets. For every tweet, the majority label was built. The annotation result had a moderate Kappa agreement.
## Preprocessing
The source for this version of the dataset is located [here](https://github.com/NilsHellwig/Twitter_German_Federal_Election_Perception_2021/tree/main/Datasets/Schmidt2022).
For better processing the line breaks of the texts are removed.
The numbers for answers, retweets and favorites are removed from the text and also the phrase "Diesen Thread anzeigen" (Show this thread). Both aren't part of the tweet and were most likely added by the crawling tool.
The preprocessing steps can be reproduced with the `cleaner.py` script.
## Annotation
The tweets were annotated as follows:
- 1 if the sentiment of the tweet is positive
- 2 if the sentiment of the tweet is negative
- 3 if the sentiment of the tweet is neutral
## Citation
```
@inproceedings{schmidt-etal-2022-sentiment,
title = "Sentiment Analysis on {T}witter for the Major {G}erman Parties during the 2021 {G}erman Federal Election",
author = "Schmidt, Thomas and
Fehle, Jakob and
Weissenbacher, Maximilian and
Richter, Jonathan and
Gottschalk, Philipp and
Wolff, Christian",
editor = "Schaefer, Robin and
Bai, Xiaoyu and
Stede, Manfred and
Zesch, Torsten",
booktitle = "Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)",
month = "12--15 " # sep,
year = "2022",
address = "Potsdam, Germany",
publisher = "KONVENS 2022 Organizers",
url = "https://aclanthology.org/2022.konvens-1.9",
pages = "74--87",
}
```
提供机构:
Alienmaster
原始信息汇总
数据集概述
基本信息
- 语言: 德语
- 任务类别: 文本分类
- 数据集大小: 1K<n<10K
- 多语言性: 单语种
- 配置:
- 默认配置:
- 训练数据:
train.parquet - 测试数据:
test.parquet
- 训练数据:
- 默认配置:
数据内容
- 数据来源: 1785条德国政治家在2021年选举期间(2021年1月1日至2021年12月31日)的手动标注推文。
- 标注过程: 由6名学术人员分为两组进行标注,每组3人,共标注约900条推文。每条推文通过多数投票确定标签。标注结果的Kappa一致性为中等。
数据预处理
- 文本处理: 移除了文本中的换行符、答案、转发和点赞数以及“显示此线程”短语。
- 可重现性: 预处理步骤可通过
cleaner.py脚本重现。
标注规则
- 情感分类:
- 1: 积极情感
- 2: 消极情感
- 3: 中性情感
引用信息
@inproceedings{schmidt-etal-2022-sentiment, title = "Sentiment Analysis on {T}witter for the Major {G}erman Parties during the 2021 {G}erman Federal Election", author = "Schmidt, Thomas and Fehle, Jakob and Weissenbacher, Maximilian and Richter, Jonathan and Gottschalk, Philipp and Wolff, Christian", editor = "Schaefer, Robin and Bai, Xiaoyu and Stede, Manfred and Zesch, Torsten", booktitle = "Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)", month = "12--15 " # sep, year = "2022", address = "Potsdam, Germany", publisher = "KONVENS 2022 Organizers", url = "https://aclanthology.org/2022.konvens-1.9", pages = "74--87", }



