jeffnyman/scifact
收藏数据集概述
数据集描述
- 名称: SciFact
- 概述: 该数据集包含专家撰写的科学声明,与包含证据的摘要配对,并附有标签和理由注释。
数据集结构
数据实例
-
claims:
-
下载文件大小: 2.72 MB
-
生成数据集大小: 0.25 MB
-
总磁盘使用: 2.97 MB
-
示例:
{ "cited_doc_ids": [14717500], "claim": "1,000 genomes project enables mapping of genetic sequence variation consisting of rare variants with larger penetrance effects than common variants.", "evidence_doc_id": "14717500", "evidence_label": "SUPPORT", "evidence_sentences": [2, 5], "id": 3 }
-
-
corpus:
-
下载文件大小: 2.72 MB
-
生成数据集大小: 7.63 MB
-
总磁盘使用: 10.35 MB
-
示例:
{ "abstract": "["Alterations of the architecture of cerebral white matter in the developing human brain can affect cortical development and res...", "doc_id": 4983, "structured": false, "title": "Microstructural development of human newborn cerebral white matter assessed in vivo by diffusion tensor magnetic resonance imaging." }
-
数据字段
- claims:
id: int32claim: stringevidence_doc_id: stringevidence_label: stringevidence_sentences: list of int32cited_doc_ids: list of int32
- corpus:
doc_id: int32title: stringabstract: list of stringstructured: bool
数据分割
- claims:
train validation test claims 1261 450 300 - corpus:
train corpus 5183
附加信息
-
许可证: SciFact数据集根据CC BY-NC 2.0发布。
-
引用信息:
@inproceedings{wadden-etal-2020-fact, title = "Fact or Fiction: Verifying Scientific Claims", author = "Wadden, David and Lin, Shanchuan and Lo, Kyle and Wang, Lucy Lu and van Zuylen, Madeleine and Cohan, Arman and Hajishirzi, Hannaneh", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2020.emnlp-main.609", doi = "10.18653/v1/2020.emnlp-main.609", pages = "7534--7550", }




