pszemraj/multi_fc

Name: pszemraj/multi_fc
Creator: pszemraj
Published: 2022-06-16 11:57:52
License: 暂无描述

Hugging Face2022-06-16 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/pszemraj/multi_fc

下载链接

链接失效反馈

官方服务：

资源简介：

multiFC数据集是一个用于自动声明验证任务的数据集，包含训练集、测试集和验证集。数据集来源于26个英文事实核查网站，包含丰富的元数据，并由人类专家记者标注真实性。训练集和验证集的标签是真实的，而测试集的标签是虚拟的，因为原始数据未提供。数据集的特征包括claimID、claim、label、claimURL、reason、categories、speaker、checker、tags、article title、publish date、climate和entities。

The multiFC dataset is a benchmark dataset for automated claim verification tasks, which includes training, validation, and test splits. It is derived from 26 English-language fact-checking websites, contains rich metadata, and the authenticity of claims has been annotated by human expert journalists. Ground-truth labels are provided for the training and validation splits, while dummy labels are used for the test split, as the original ground-truth data for the test set was not disclosed. The features of the dataset include claimID, claim, label, claimURL, reason, categories, speaker, checker, tags, article title, publish date, climate, and entities.

提供机构：

pszemraj

原始信息汇总

multiFC 数据集概述

数据集描述

任务类型: 自动声明验证（automatic claim verification）
数据来源: 从26个英语事实核查网站收集
标签: 由人类专家记者标注真实性

数据集内容

数据集结构: 包含训练集、测试集和验证集
特征字段:
- claimID: 声明ID
- claim: 声明内容
- label: 标签（在测试集中为虚拟值）
- claimURL: 声明链接
- reason: 原因
- categories: 分类
- speaker: 发言人
- checker: 核查者
- tags: 标签
- article title: 文章标题
- publish date: 发布日期
- climate: 气候
- entities: 实体

数据集大小

训练集: 27871条记录
测试集: 3487条记录
验证集: 3484条记录

引用信息

@inproceedings{conf/emnlp2019/Augenstein, added-at = {2019-10-27T00:00:00.000+0200}, author = {Augenstein, Isabelle and Lioma, Christina and Wang, Dongsheng and Chaves Lima, Lucas and Hansen, Casper and Hansen, Christian and Grue Simonsen, Jakob}, booktitle = {EMNLP}, crossref = {conf/emnlp/2019}, publisher = {Association for Computational Linguistics}, title = {MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims}, year = 2019 }

搜集汇总

数据集介绍

背景与挑战

背景概述

The 'multi_fc' dataset is a large-scale, multi-domain collection designed for automatic claim verification, featuring claims labeled for veracity by experts. It includes textual sources and metadata, with a focus on evidence-based fact-checking. The dataset is split into training, validation, and test sets, with the test set lacking labels for verification purposes.

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集