five

Alienmaster/german_politicians_twitter_sentiment

收藏
Hugging Face2024-04-23 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/Alienmaster/german_politicians_twitter_sentiment
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - de task_categories: - text-classification size_categories: - 1K<n<10K multilinguality: - monolingual configs: - config_name: default data_files: - split: train path: "train.parquet" - split: test path: "test.parquet" --- ## Information This dataset shows 1785 manually annotated tweets from German politicians during the election year 2021 (01.01.2021 - 31.12.2021). The tweets were annotated by 6 academics which were separated into two different groups. So every group of 3 people annotated the sentiment of ~900 tweets. For every tweet, the majority label was built. The annotation result had a moderate Kappa agreement. ## Preprocessing The source for this version of the dataset is located [here](https://github.com/NilsHellwig/Twitter_German_Federal_Election_Perception_2021/tree/main/Datasets/Schmidt2022). For better processing the line breaks of the texts are removed. The numbers for answers, retweets and favorites are removed from the text and also the phrase "Diesen Thread anzeigen" (Show this thread). Both aren't part of the tweet and were most likely added by the crawling tool. The preprocessing steps can be reproduced with the `cleaner.py` script. ## Annotation The tweets were annotated as follows: - 1 if the sentiment of the tweet is positive - 2 if the sentiment of the tweet is negative - 3 if the sentiment of the tweet is neutral ## Citation ``` @inproceedings{schmidt-etal-2022-sentiment, title = "Sentiment Analysis on {T}witter for the Major {G}erman Parties during the 2021 {G}erman Federal Election", author = "Schmidt, Thomas and Fehle, Jakob and Weissenbacher, Maximilian and Richter, Jonathan and Gottschalk, Philipp and Wolff, Christian", editor = "Schaefer, Robin and Bai, Xiaoyu and Stede, Manfred and Zesch, Torsten", booktitle = "Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)", month = "12--15 " # sep, year = "2022", address = "Potsdam, Germany", publisher = "KONVENS 2022 Organizers", url = "https://aclanthology.org/2022.konvens-1.9", pages = "74--87", } ```
提供机构:
Alienmaster
原始信息汇总

数据集概述

基本信息

  • 语言: 德语
  • 任务类别: 文本分类
  • 数据集大小: 1K<n<10K
  • 多语言性: 单语种
  • 配置:
    • 默认配置:
      • 训练数据: train.parquet
      • 测试数据: test.parquet

数据内容

  • 数据来源: 1785条德国政治家在2021年选举期间(2021年1月1日至2021年12月31日)的手动标注推文。
  • 标注过程: 由6名学术人员分为两组进行标注,每组3人,共标注约900条推文。每条推文通过多数投票确定标签。标注结果的Kappa一致性为中等。

数据预处理

  • 文本处理: 移除了文本中的换行符、答案、转发和点赞数以及“显示此线程”短语。
  • 可重现性: 预处理步骤可通过cleaner.py脚本重现。

标注规则

  • 情感分类:
    • 1: 积极情感
    • 2: 消极情感
    • 3: 中性情感

引用信息

@inproceedings{schmidt-etal-2022-sentiment, title = "Sentiment Analysis on {T}witter for the Major {G}erman Parties during the 2021 {G}erman Federal Election", author = "Schmidt, Thomas and Fehle, Jakob and Weissenbacher, Maximilian and Richter, Jonathan and Gottschalk, Philipp and Wolff, Christian", editor = "Schaefer, Robin and Bai, Xiaoyu and Stede, Manfred and Zesch, Torsten", booktitle = "Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)", month = "12--15 " # sep, year = "2022", address = "Potsdam, Germany", publisher = "KONVENS 2022 Organizers", url = "https://aclanthology.org/2022.konvens-1.9", pages = "74--87", }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作