allenai/scifact_entailment
收藏Hugging Face2023-09-27 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/allenai/scifact_entailment
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- expert-generated
language:
- en
language_creators:
- found
license:
- cc-by-nc-2.0
multilinguality:
- monolingual
pretty_name: SciFact
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- text-classification
task_ids:
- fact-checking
paperswithcode_id: scifact
dataset_info:
features:
- name: claim_id
dtype: int32
- name: claim
dtype: string
- name: abstract_id
dtype: int32
- name: title
dtype: string
- name: abstract
sequence: string
- name: verdict
dtype: string
- name: evidence
sequence: int32
splits:
- name: train
num_bytes: 1649655
num_examples: 919
- name: validation
num_bytes: 605262
num_examples: 340
download_size: 3115079
dataset_size: 2254917
---
# Dataset Card for "scifact_entailment"
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Dataset Structure](#dataset-structure)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
## Dataset Description
- **Homepage:** [https://scifact.apps.allenai.org/](https://scifact.apps.allenai.org/)
- **Repository:** <https://github.com/allenai/scifact>
- **Paper:** [Fact or Fiction: Verifying Scientific Claims](https://aclanthology.org/2020.emnlp-main.609/)
- **Point of Contact:** [David Wadden](mailto:davidw@allenai.org)
### Dataset Summary
SciFact, a dataset of 1.4K expert-written scientific claims paired with evidence-containing abstracts, and annotated with labels and rationales.
For more information on the dataset, see [allenai/scifact](https://huggingface.co/datasets/allenai/scifact).
This has the same data, but reformatted as an entailment task. A single instance includes a claim paired with a paper title and abstract, together with an entailment label and a list of evidence sentences (if any).
## Dataset Structure
### Data fields
- `claim_id`: An `int32` claim identifier.
- `claim`: A `string`.
- `abstract_id`: An `int32` abstract identifier.
- `title`: A `string`.
- `abstract`: A list of `strings`, one for each sentence in the abstract.
- `verdict`: The fact-checking verdict, a `string`.
- `evidence`: A list of sentences from the abstract which provide evidence for the verdict.
### Data Splits
| |train|validation|
|------|----:|---------:|
|claims| 919 | 340|
提供机构:
allenai
原始信息汇总
数据集概述
数据集摘要
SciFact 是一个包含 1.4K 专家编写的科学声明的数据集,这些声明与包含证据的摘要配对,并带有标签和理由。该数据集被重新格式化为一个蕴含任务,每个实例包括一个声明与论文标题和摘要配对,以及一个蕴含标签和一个证据句子列表(如果有)。
数据集结构
数据字段
claim_id: 声明的标识符,类型为int32。claim: 声明内容,类型为string。abstract_id: 摘要的标识符,类型为int32。title: 论文标题,类型为string。abstract: 摘要中的句子列表,每个句子为一个string。verdict: 事实核查的裁决,类型为string。evidence: 摘要中提供证据支持裁决的句子列表。
数据分割
| train | validation | |
|---|---|---|
| claims | 919 | 340 |



