On the Use of Context for Predicting Citation Worthiness of Sentences in Scholarly Articles
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4651553
下载链接
链接失效反馈官方服务:
资源简介:
The ACL-cite dataset was created for the paper: “On the Use of Context for Predicting Citation Worthiness of Sentences in Scholarly Articles” published in NAACL 2021. This dataset contains over 2.7 million sentences extracted from scholarly articles (from ACL Anthology [Bird et al.]) and their corresponding citation worthiness labels. The goal of the citation worthiness task is to determine whether a given sentence requires a citation.
There are three CSV files in the dataset:
train.csv: 1,625,268 rows
dev.csv: 539,085 rows
test.csv: 542,081 rows
Each CSV file contains the following columns:
document_id: identifier of the paper the sentence was extracted from
section: name of the section the sentence was extracted from, (e.g. Abstract, Introduction, etc.)
section_id: sequential identifier of the section in the paper
paragraph_id: sequential identifier of the paragraph the sentence was extracted from
sentence: the sentence with the citations removed
raw_sentence: the raw sentence including the citations
sentence_id: sequential identifier of the sentence in the paper
label: citation worthiness label
Note: The train/dev/test splits are done at the document_id level.
创建时间:
2021-04-09



