FactSpan: Multilingual Fact-Checking Dataset
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/15084387
下载链接
链接失效反馈官方服务:
资源简介:
The FactSpan dataset is an extension of the X-Fact dataset, designed to support multilingual fact-checking research. This dataset overcomes limitations in existing datasets by incorporating recent data from the ClaimReview Markup for Data Commons Feed and providing detailed annotations.
Key Features:
Data Source: Claims are sourced from both the X-Fact dataset (up to 2020) and the Data Commons Feed (post-2020).
Validity: Claims are filtered to include only those from organizations recognized by the International Fact-Checking Network (IFCN) and Duke Reporters’ Lab, ensuring high reliability.
Standardized Labels: Verdict labels are standardized into five categories: False, Mostly False, Partly False/Misleading, Mostly True, and True.
Annotations (Annotated Dataset Only): The FactSpan_annotated.csv dataset includes rich annotations generated using GPT-3.5:
label: The standardized verdict label.
claim: The fact-checked claim.
claimDate: The date of the claim.
claim_year: The year of the claim.
language: The language of the claim.
Position Statements: Indicates the presence of position statements.
Entity/Event Properties: Indicates the presence of entity or event properties.
Quote: Indicates the presence of quotes.
Numerical Data: Indicates the presence of numerical data.
claim type: Categorizes the claim as factual or opinion.
topics: Categorizes the claim into one of five predefined topics (Health and Pandemics, Politics and Governance, Society and Culture, Economy and Environment, Conflict and Security).
mapped_label: An additional mapped label, for edge cases or further label mappings.
Unannotated Dataset: The FactSpan.csv dataset includes:
label: The standardized verdict label.
claim: The fact-checked claim.
claimDate: The date of the claim.
language: The language of the claim.
Purpose:
This dataset aims to facilitate research in multilingual fact-checking, providing a comprehensive and up-to-date resource for developing and evaluating fact-checking models.
Repository:
The dataset is maintained in the GitHub repository. The repository also contains scripts for expanding and updating the dataset.
创建时间:
2025-03-25



