five

Citation-Context Dataset (C2D)

收藏
DataCite Commons2020-08-28 更新2025-04-16 收录
下载链接:
https://ordo.open.ac.uk/articles/Citation-Context_Dataset_C2D_/6865298/1
下载链接
链接失效反馈
官方服务:
资源简介:
C2D dataset is created by using 2 million full-text open-source research publications obtained from CORE. It contains 53 million unique records of citation-information. To construct C2D, we extracted citation information from each publication. Information such as cited document's title, author(s), published date and citation-context. We will describe the assumption of extracting citation-context in a bit more detail below:<br>First of all, we extracted positions of citations where they are mentioned including citation-contexts which are texts around the cited documents. For our purpose, we created a citation-context using three sentences; the sentence where the reference has been cited, the preceding, and the following sentence. Additionally, at the start or end of a paragraph, the preceding or following sentence is not extracted respectively.<br>Therefore, the attributes of the dataset contain:Attributes:<strong>ReferenceID</strong> - unique identifier of cited reference in a citing document<strong>SourceID</strong> - unique identifier of a citing document.<strong>ChapterNumber</strong> - Chapter number of the citing document where the <b>ReferenceID</b> has mentioned.<strong>ParagraphNumber</strong> - paragraph number of the citing document where the reference <strong>ReferenceID</strong> has mentioned.<strong>SentenceNumber</strong> - sentence number of the citing document where the reference <strong>ReferencedID</strong> has mentioned.<strong>Title</strong> - Title of the reference <strong>ReferenceID.</strong><strong>PublishedDate</strong> - Publication date when the reference <strong>ReferenceID </strong>was published.<strong>Authors</strong> - Author(s) of the reference <strong>ReferenceID</strong><strong>TextBeforeRefMention</strong> - Sentence just before the sentence where the reference <strong>ReferenceID</strong> has been cited.<strong> TextWhereRefMention</strong> - Sentence where the reference <strong>ReferenceID</strong> has been cited.<strong>TextAfterRefMention</strong> - Sentence just after the sentence where the reference <strong>ReferenceID</strong> has been cited.Note:<br>The actual size of the dataset is ~40gb however compressed size is ~6.7gb.Requirements of different users may be different therefore we have released the raw version of the dataset. Please note, data cleansing (such as special character and stop-word removal) has not been performed.
提供机构:
The Open University
创建时间:
2018-08-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作