pszemraj/multi_fc
收藏Hugging Face2022-06-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/pszemraj/multi_fc
下载链接
链接失效反馈官方服务:
资源简介:
multiFC数据集是一个用于自动声明验证任务的数据集,包含训练集、测试集和验证集。数据集来源于26个英文事实核查网站,包含丰富的元数据,并由人类专家记者标注真实性。训练集和验证集的标签是真实的,而测试集的标签是虚拟的,因为原始数据未提供。数据集的特征包括claimID、claim、label、claimURL、reason、categories、speaker、checker、tags、article title、publish date、climate和entities。
The multiFC dataset is a benchmark dataset for automated claim verification tasks, which includes training, validation, and test splits. It is derived from 26 English-language fact-checking websites, contains rich metadata, and the authenticity of claims has been annotated by human expert journalists. Ground-truth labels are provided for the training and validation splits, while dummy labels are used for the test split, as the original ground-truth data for the test set was not disclosed. The features of the dataset include claimID, claim, label, claimURL, reason, categories, speaker, checker, tags, article title, publish date, climate, and entities.
提供机构:
pszemraj
原始信息汇总
multiFC 数据集概述
数据集描述
- 任务类型: 自动声明验证(automatic claim verification)
- 数据来源: 从26个英语事实核查网站收集
- 标签: 由人类专家记者标注真实性
数据集内容
- 数据集结构: 包含训练集、测试集和验证集
- 特征字段:
claimID: 声明IDclaim: 声明内容label: 标签(在测试集中为虚拟值)claimURL: 声明链接reason: 原因categories: 分类speaker: 发言人checker: 核查者tags: 标签article title: 文章标题publish date: 发布日期climate: 气候entities: 实体
数据集大小
- 训练集: 27871条记录
- 测试集: 3487条记录
- 验证集: 3484条记录
引用信息
@inproceedings{conf/emnlp2019/Augenstein, added-at = {2019-10-27T00:00:00.000+0200}, author = {Augenstein, Isabelle and Lioma, Christina and Wang, Dongsheng and Chaves Lima, Lucas and Hansen, Casper and Hansen, Christian and Grue Simonsen, Jakob}, booktitle = {EMNLP}, crossref = {conf/emnlp/2019}, publisher = {Association for Computational Linguistics}, title = {MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims}, year = 2019 }
搜集汇总
数据集介绍

背景与挑战
背景概述
The 'multi_fc' dataset is a large-scale, multi-domain collection designed for automatic claim verification, featuring claims labeled for veracity by experts. It includes textual sources and metadata, with a focus on evidence-based fact-checking. The dataset is split into training, validation, and test sets, with the test set lacking labels for verification purposes.
以上内容由遇见数据集搜集并总结生成



