five

pszemraj/multi_fc

收藏
Hugging Face2022-06-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/pszemraj/multi_fc
下载链接
链接失效反馈
官方服务:
资源简介:
multiFC数据集是一个用于自动声明验证任务的数据集,包含训练集、测试集和验证集。数据集来源于26个英文事实核查网站,包含丰富的元数据,并由人类专家记者标注真实性。训练集和验证集的标签是真实的,而测试集的标签是虚拟的,因为原始数据未提供。数据集的特征包括claimID、claim、label、claimURL、reason、categories、speaker、checker、tags、article title、publish date、climate和entities。

The multiFC dataset is a benchmark dataset for automated claim verification tasks, which includes training, validation, and test splits. It is derived from 26 English-language fact-checking websites, contains rich metadata, and the authenticity of claims has been annotated by human expert journalists. Ground-truth labels are provided for the training and validation splits, while dummy labels are used for the test split, as the original ground-truth data for the test set was not disclosed. The features of the dataset include claimID, claim, label, claimURL, reason, categories, speaker, checker, tags, article title, publish date, climate, and entities.
提供机构:
pszemraj
原始信息汇总

multiFC 数据集概述

数据集描述

  • 任务类型: 自动声明验证(automatic claim verification)
  • 数据来源: 从26个英语事实核查网站收集
  • 标签: 由人类专家记者标注真实性

数据集内容

  • 数据集结构: 包含训练集、测试集和验证集
  • 特征字段:
    • claimID: 声明ID
    • claim: 声明内容
    • label: 标签(在测试集中为虚拟值)
    • claimURL: 声明链接
    • reason: 原因
    • categories: 分类
    • speaker: 发言人
    • checker: 核查者
    • tags: 标签
    • article title: 文章标题
    • publish date: 发布日期
    • climate: 气候
    • entities: 实体

数据集大小

  • 训练集: 27871条记录
  • 测试集: 3487条记录
  • 验证集: 3484条记录

引用信息

@inproceedings{conf/emnlp2019/Augenstein, added-at = {2019-10-27T00:00:00.000+0200}, author = {Augenstein, Isabelle and Lioma, Christina and Wang, Dongsheng and Chaves Lima, Lucas and Hansen, Casper and Hansen, Christian and Grue Simonsen, Jakob}, booktitle = {EMNLP}, crossref = {conf/emnlp/2019}, publisher = {Association for Computational Linguistics}, title = {MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims}, year = 2019 }

搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
The 'multi_fc' dataset is a large-scale, multi-domain collection designed for automatic claim verification, featuring claims labeled for veracity by experts. It includes textual sources and metadata, with a focus on evidence-based fact-checking. The dataset is split into training, validation, and test sets, with the test set lacking labels for verification purposes.
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作