five

FactSpan: Multilingual Fact-Checking Dataset

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/15084387
下载链接
链接失效反馈
官方服务:
资源简介:
The FactSpan dataset is an extension of the X-Fact dataset, designed to support multilingual fact-checking research. This dataset overcomes limitations in existing datasets by incorporating recent data from the ClaimReview Markup for Data Commons Feed and providing detailed annotations. Key Features: Data Source: Claims are sourced from both the X-Fact dataset (up to 2020) and the Data Commons Feed (post-2020). Validity: Claims are filtered to include only those from organizations recognized by the International Fact-Checking Network (IFCN) and Duke Reporters’ Lab, ensuring high reliability. Standardized Labels: Verdict labels are standardized into five categories: False, Mostly False, Partly False/Misleading, Mostly True, and True. Annotations (Annotated Dataset Only): The FactSpan_annotated.csv dataset includes rich annotations generated using GPT-3.5: label: The standardized verdict label. claim: The fact-checked claim. claimDate: The date of the claim. claim_year: The year of the claim. language: The language of the claim. Position Statements: Indicates the presence of position statements. Entity/Event Properties: Indicates the presence of entity or event properties. Quote: Indicates the presence of quotes. Numerical Data: Indicates the presence of numerical data. claim type: Categorizes the claim as factual or opinion. topics: Categorizes the claim into one of five predefined topics (Health and Pandemics, Politics and Governance, Society and Culture, Economy and Environment, Conflict and Security). mapped_label: An additional mapped label, for edge cases or further label mappings. Unannotated Dataset: The FactSpan.csv dataset includes: label: The standardized verdict label. claim: The fact-checked claim. claimDate: The date of the claim. language: The language of the claim. Purpose: This dataset aims to facilitate research in multilingual fact-checking, providing a comprehensive and up-to-date resource for developing and evaluating fact-checking models. Repository: The dataset is maintained in the GitHub repository. The repository also contains scripts for expanding and updating the dataset.
创建时间:
2025-03-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作