five

datacommonsorg/datacommons_factcheck

收藏
Hugging Face2024-01-18 更新2024-05-25 收录
下载链接:
https://hf-mirror.com/datasets/datacommonsorg/datacommons_factcheck
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - expert-generated language_creators: - found language: - en license: - cc-by-nc-4.0 multilinguality: - monolingual size_categories: - 1K<n<10K - n<1K source_datasets: - original task_categories: - text-classification task_ids: - fact-checking paperswithcode_id: null pretty_name: DataCommons Fact Checked claims dataset_info: - config_name: fctchk_politifact_wapo features: - name: reviewer_name dtype: string - name: claim_text dtype: string - name: review_date dtype: string - name: review_url dtype: string - name: review_rating dtype: string - name: claim_author_name dtype: string - name: claim_date dtype: string splits: - name: train num_bytes: 1772321 num_examples: 5632 download_size: 671896 dataset_size: 1772321 - config_name: weekly_standard features: - name: reviewer_name dtype: string - name: claim_text dtype: string - name: review_date dtype: string - name: review_url dtype: string - name: review_rating dtype: string - name: claim_author_name dtype: string - name: claim_date dtype: string splits: - name: train num_bytes: 35061 num_examples: 132 download_size: 671896 dataset_size: 35061 config_names: - fctchk_politifact_wapo - weekly_standard --- # Dataset Card for DataCommons Fact Checked claims ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** [Data Commons fact checking FAQ](https://datacommons.org/factcheck/faq) ### Dataset Summary A dataset of fact checked claims by news media maintained by [datacommons.org](https://datacommons.org/) containing the claim, author, and judgments, as well as the URL of the full explanation by the original fact-checker. The fact checking is done by [FactCheck.org](https://www.factcheck.org/), [PolitiFact](https://www.politifact.com/), and [The Washington Post](https://www.washingtonpost.com/). ### Supported Tasks and Leaderboards [More Information Needed] ### Languages The data is in English (`en`). ## Dataset Structure ### Data Instances An example of fact checking instance looks as follows: ``` {'claim_author_name': 'Facebook posts', 'claim_date': '2019-01-01', 'claim_text': 'Quotes Michelle Obama as saying, "White folks are what’s wrong with America."', 'review_date': '2019-01-03', 'review_rating': 'Pants on Fire', 'review_url': 'https://www.politifact.com/facebook-fact-checks/statements/2019/jan/03/facebook-posts/did-michelle-obama-once-say-white-folks-are-whats-/', 'reviewer_name': 'PolitiFact'} ``` ### Data Fields A data instance has the following fields: - `review_date`: the day the fact checking report was posted. Missing values are replaced with empty strings - `review_url`: URL for the full fact checking report - `reviewer_name`: the name of the fact checking service. - `claim_text`: the full text of the claim being reviewed. - `claim_author_name`: the author of the claim being reviewed. Missing values are replaced with empty strings - `claim_date` the date of the claim. Missing values are replaced with empty strings - `review_rating`: the judgments of the fact checker (under `alternateName`, names vary by fact checker) ### Data Splits No splits are provided. There are a total of 5632 claims fact-checked. ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? The fact checking is done by [FactCheck.org](https://www.factcheck.org/), [PolitiFact](https://www.politifact.com/), [The Washington Post](https://www.washingtonpost.com/), and [The Weekly Standard](https://www.weeklystandard.com/). - [FactCheck.org](https://www.factcheck.org/) self describes as "a nonpartisan, nonprofit 'consumer advocate' for voters that aims to reduce the level of deception and confusion in U.S. politics." It was founded by journalists Kathleen Hall Jamieson and Brooks Jackson and is currently directed by Eugene Kiely. - [PolitiFact](https://www.politifact.com/) describe their ethics as "seeking to present the true facts, unaffected by agenda or biases, [with] journalists setting their own opinions aside." It was started in August 2007 by Times Washington Bureau Chief Bill Adair. The organization was acquired in February 2018 by the Poynter Institute, a non-profit journalism education and news media research center that also owns the Tampa Bay Times. - [The Washington Post](https://www.washingtonpost.com/) is a newspaper considered to be near the center of the American political spectrum. In 2013 Amazon.com founder Jeff Bezos bought the newspaper and affiliated publications. The original data source also contains 132 items reviewed by [The Weekly Standard](https://www.weeklystandard.com/), which was a neo-conservative American newspaper. IT is the most politically loaded source of the group, which was originally a vocal creitic of the activity of fact-checking, and has historically taken stances [close to the American right](https://en.wikipedia.org/wiki/The_Weekly_Standard#Support_of_the_invasion_of_Iraq). It also had to admit responsibility for baseless accusations against a well known author in a public [libel case](https://en.wikipedia.org/wiki/The_Weekly_Standard#Libel_case). The fact checked items from this source can be found in the `weekly_standard` configuration but should be used only with full understanding of this context. ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases See section above describing the [fact checking organizations](#who-are-the-annotators?). [More Information Needed] ### Other Known Limitations Dataset provided for research purposes only. Please check dataset license for additional information. ## Additional Information ### Dataset Curators This fact checking dataset is maintained by [datacommons.org](https://datacommons.org/), a Google initiative. ### Licensing Information All fact checked items are released under a `CC-BY-NC-4.0` License. ### Citation Information Data Commons 2020, Fact Checks, electronic dataset, Data Commons, viewed 16 Dec 2020, <https://datacommons.org>. ### Contributions Thanks to [@yjernite](https://github.com/yjernite) for adding this dataset.
提供机构:
datacommonsorg
原始信息汇总

数据集概述

数据集名称

  • 名称: DataCommons Fact Checked claims

数据集摘要

  • 摘要: 由datacommons.org维护的新闻媒体事实核查数据集,包含声明、作者和判断,以及原始事实核查者提供的完整解释的URL。

语言

  • 语言: 英语 (en)

许可

  • 许可: CC-BY-NC-4.0

多语言性

  • 多语言性: 单语种

大小分类

  • 大小: 1K<n<10K, n<1K

源数据集

  • 源数据: 原始数据

任务类别

  • 任务类别: 文本分类

任务ID

  • 任务ID: fact-checking

数据集结构

数据实例

  • 实例描述: 每个实例包含声明作者、声明日期、声明文本、审查日期、审查评级、审查URL和审查者名称。

数据字段

  • 字段:
    • review_date: 审查报告发布日期
    • review_url: 完整审查报告的URL
    • reviewer_name: 审查服务名称
    • claim_text: 审查的完整声明文本
    • claim_author_name: 声明作者
    • claim_date: 声明日期
    • review_rating: 审查者的判断

数据分割

  • 分割:
    • train: 5632个实例,数据大小1772321字节
    • weekly_standard: 132个实例,数据大小35061字节

数据集创建

注释者

数据集维护者

许可信息

  • 许可: 所有事实核查项均根据CC-BY-NC-4.0许可证发布。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作