webimmunization/COVID-19-conspiracy-theories-tweets

Name: webimmunization/COVID-19-conspiracy-theories-tweets
Creator: webimmunization
Published: 2024-02-11 19:14:06
License: 暂无描述

Hugging Face2024-02-11 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/webimmunization/COVID-19-conspiracy-theories-tweets

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 size_categories: - 1K<n<10K task_categories: - text-classification tags: - twitter - social_science - misinformation - fake_news - conspiracy_theory language: - en --- ## Dataset Description - **Paper:** [More Information Needed] - **Point of Contact:** izabela.krysinska@doctorate.put.poznan.pl ### Dataset Summary This dataset consists of 6591 tweets generated by GPT-3.5 model. The tweets are juxtaposed with a conspiracy theory related to COVID-19 pandemic. Each item consists of a label that represents the item's output class. The possible labels are support/deny/neutral. - **support**: the tweet suggests support for the conspiracy theory - **deny**: the tweet contradicts the conspiracy theory - **neutral**: the tweet is mostly informative, and does not show emotions against the conspiracy theory The dataset can be used to train a classification model. ### Languages English ## Dataset Structure ### Data Instances ``` { 'tweet': 'Is the Chinese government exploiting the pandemic to gain an economic advantage? #COVIDEconomy #ChineseTradeWar', 'conspiracy_theory': 'CT_3', 'label': 'support' } ``` ### Data Fields - `tweet`: a text generated by GPT-3.5 (input) - `conspiracy theory`: a conspiracy theory identifier - `label`: label, support/deny/neutral Conspiracy theories mapping: 1. **CT1: Deliberate strategy to create economic instability or benefit large corporations.** The coronavirus or the government's response to it is a deliberate strategy to create economic instability or to benefit large corporations over small businesses. 2. **CT2: Public was intentionally misled about the true nature of the virus and prevention.** The public is being intentionally misled about the true nature of the Coronavirus, its risks, or the efficacy of certain treatments or prevention methods. 3. **CT3: Human made and bioweapon.** The Coronavirus was created intentionally, made by humans, or as a bioweapon. 4. **CT4: Governments and politicians spread misinformation.** Politicians or government agencies are intentionally spreading false information, or they have some other motive for the way they are responding to the coronavirus. 5. **CT5: The Chinese intentionally spread the virus.** The Chinese government intentionally created or spread the coronavirus to harm other countries. 6. **CT6: Vaccines are unsafe.** The coronavirus vaccine is either unsafe or part of a larger plot to control people or reduce the population. ### Data Splits The dataset contains training split only which consists of 6591 items. ## Dataset Creation The dataset was generated with GPT-3.5 with the following prompts for support, deny, and neutral class respectively: **support** Consider the following conspiracy theory: X. Generate 50 tweets that support this conspiracy theory. Try to use hashtags that might promote this particular conspiracy theory. Try to use words and terms related to the COVID pandemic. Do not quote the conspiracy theory verbatim. Do not repeat tweets and try to make them diversified. Keep each tweet below the 280 character length limit. Present the tweets as a list. **deny** Consider the following conspiracy theory: X. Generate 50 tweets that contradict this conspiracy theory. Try to use hashtags that might debunk this particular conspiracy theory. Try to use words and terms related to the COVID pandemic. Do not quote the conspiracy theory verbatim. Do not repeat tweets and try to make them diversified. Keep each tweet below the 280 character length limit. Present the tweets as a list. **neutral** Consider the following conspiracy theory: X. Generate 50 tweets that are about COVID-19 but unrelated to the conspiracy theory. Try to use hashtags that might be used in such a tweet. Try to use words and terms related to the COVID pandemic. Do not quote the conspiracy theory verbatim. Do not repeat tweets and try to make them diversified. Keep each tweet below the 280 character length limit. Present the tweets as a list. ### Known Limitations The generated tweets are sometimes formulaic and lack of diversity. ### Citation Information ``` @article{article_id, author = {Author List}, title = {Dataset Paper Title}, journal = {Publication Venue}, year = {2525} } ```

提供机构：

webimmunization

原始信息汇总

数据集描述

数据集概述

该数据集包含6591条由GPT-3.5模型生成的推文，这些推文与COVID-19大流行相关的阴谋论并列。每条推文都有一个标签，表示其输出类别，可能的标签为支持/否认/中性。

支持: 推文支持阴谋论
否认: 推文反驳阴谋论
中性: 推文主要是信息性的，不显示对阴谋论的情感

该数据集可用于训练分类模型。

语言

英语

数据集结构

数据实例

json { tweet: Is the Chinese government exploiting the pandemic to gain an economic advantage? #COVIDEconomy #ChineseTradeWar, conspiracy_theory: CT_3, label: support }

数据字段

tweet: 由GPT-3.5生成的文本（输入）
conspiracy theory: 阴谋论标识符
label: 标签，支持/否认/中性

阴谋论映射:

CT1: 故意策略以创造经济不稳定或使大公司受益
CT2: 公众被故意误导关于病毒的真实性质和预防
CT3: 人为制造和生物武器
CT4: 政府和政客散布错误信息
CT5: 中国故意传播病毒
CT6: 疫苗不安全

数据分割

数据集仅包含训练集，包含6591条数据。

数据集创建

数据集使用GPT-3.5生成，针对支持、否认和中性类别分别生成推文。

已知限制

生成的推文有时公式化且缺乏多样性。

引用信息

plaintext @article{article_id, author = {Author List}, title = {Dataset Paper Title}, journal = {Publication Venue}, year = {2525} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集