scicite
收藏魔搭社区2025-07-11 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/allenai/scicite
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for "scicite"
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:**
- **Repository:** https://github.com/allenai/scicite
- **Paper:** [Structural Scaffolds for Citation Intent Classification in Scientific Publications](https://arxiv.org/abs/1904.01608)
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Size of downloaded dataset files:** 23.19 MB
- **Size of the generated dataset:** 5.15 MB
- **Total amount of disk used:** 28.33 MB
### Dataset Summary
This is a dataset for classifying citation intents in academic papers.
The main citation intent label for each Json object is specified with the label
key while the citation context is specified in with a context key. Example:
{
'string': 'In chacma baboons, male-infant relationships can be linked to both
formation of friendships and paternity success [30,31].'
'sectionName': 'Introduction',
'label': 'background',
'citingPaperId': '7a6b2d4b405439',
'citedPaperId': '9d1abadc55b5e0',
...
}
You may obtain the full information about the paper using the provided paper ids
with the Semantic Scholar API (https://api.semanticscholar.org/).
The labels are:
Method, Background, Result
### Supported Tasks and Leaderboards
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Languages
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## Dataset Structure
### Data Instances
#### default
- **Size of downloaded dataset files:** 23.19 MB
- **Size of the generated dataset:** 5.15 MB
- **Total amount of disk used:** 28.33 MB
An example of 'validation' looks as follows.
```
{
"citeEnd": 68,
"citeStart": 64,
"citedPaperId": "5e413c7872f5df231bf4a4f694504384560e98ca",
"citingPaperId": "8f1fbe460a901d994e9b81d69f77bfbe32719f4c",
"excerpt_index": 0,
"id": "8f1fbe460a901d994e9b81d69f77bfbe32719f4c>5e413c7872f5df231bf4a4f694504384560e98ca",
"isKeyCitation": false,
"label": 2,
"label2": 0,
"label2_confidence": 0.0,
"label_confidence": 0.0,
"sectionName": "Discussion",
"source": 4,
"string": "These results are in contrast with the findings of Santos et al.(16), who reported a significant association between low sedentary time and healthy CVF among Portuguese"
}
```
### Data Fields
The data fields are the same among all splits.
#### default
- `string`: a `string` feature.
- `sectionName`: a `string` feature.
- `label`: a classification label, with possible values including `method` (0), `background` (1), `result` (2).
- `citingPaperId`: a `string` feature.
- `citedPaperId`: a `string` feature.
- `excerpt_index`: a `int32` feature.
- `isKeyCitation`: a `bool` feature.
- `label2`: a classification label, with possible values including `supportive` (0), `not_supportive` (1), `cant_determine` (2), `none` (3).
- `citeEnd`: a `int64` feature.
- `citeStart`: a `int64` feature.
- `source`: a classification label, with possible values including `properNoun` (0), `andPhrase` (1), `acronym` (2), `etAlPhrase` (3), `explicit` (4).
- `label_confidence`: a `float32` feature.
- `label2_confidence`: a `float32` feature.
- `id`: a `string` feature.
### Data Splits
| name |train|validation|test|
|-------|----:|---------:|---:|
|default| 8194| 916|1859|
## Dataset Creation
### Curation Rationale
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
#### Who are the source language producers?
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Annotations
#### Annotation process
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
#### Who are the annotators?
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Personal and Sensitive Information
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Discussion of Biases
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Other Known Limitations
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## Additional Information
### Dataset Curators
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Licensing Information
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Citation Information
```
@inproceedings{cohan-etal-2019-structural,
title = "Structural Scaffolds for Citation Intent Classification in Scientific Publications",
author = "Cohan, Arman and
Ammar, Waleed and
van Zuylen, Madeleine and
Cady, Field",
booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
month = jun,
year = "2019",
address = "Minneapolis, Minnesota",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/N19-1361",
doi = "10.18653/v1/N19-1361",
pages = "3586--3596",
}
```
### Contributions
Thanks to [@lewtun](https://github.com/lewtun), [@patrickvonplaten](https://github.com/patrickvonplaten), [@mariamabarham](https://github.com/mariamabarham), [@thomwolf](https://github.com/thomwolf) for adding this dataset.
# 「scicite」数据集卡片
## 目录
- [数据集描述](#dataset-description)
- [数据集概述](#dataset-summary)
- [支持的任务与基准排行榜](#supported-tasks-and-leaderboards)
- [语言](#languages)
- [数据集结构](#dataset-structure)
- [数据实例](#data-instances)
- [数据字段](#data-fields)
- [数据划分](#data-splits)
- [数据集构建](#dataset-creation)
- [数据集构建依据](#curation-rationale)
- [源数据](#source-data)
- [标注信息](#annotations)
- [个人与敏感信息](#personal-and-sensitive-information)
- [数据集使用注意事项](#considerations-for-using-the-data)
- [数据集的社会影响](#social-impact-of-dataset)
- [偏差讨论](#discussion-of-biases)
- [其他已知局限性](#other-known-limitations)
- [附加信息](#additional-information)
- [数据集维护者](#dataset-curators)
- [许可信息](#licensing-information)
- [引用信息](#citation-information)
- [贡献致谢](#contributions)
## 数据集描述
- **主页:**
- **代码仓库:** https://github.com/allenai/scicite
- **相关论文:** [《学术出版物中引用意图分类的结构支架》(Structural Scaffolds for Citation Intent Classification in Scientific Publications)](https://arxiv.org/abs/1904.01608)
- **联系方式:** [更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **下载的数据集文件大小:** 23.19 MB
- **生成的数据集大小:** 5.15 MB
- **总磁盘占用空间:** 28.33 MB
### 数据集概述
本数据集用于学术论文的引用意图分类任务。每个JSON对象的主要引用意图标签通过`label`字段指定,引用上下文则通过`context`字段指定。示例如下:
{
'string': 'In chacma baboons, male-infant relationships can be linked to both formation of friendships and paternity success [30,31].',
'sectionName': 'Introduction',
'label': 'background',
'citingPaperId': '7a6b2d4b405439',
'citedPaperId': '9d1abadc55b5e0',
...
}
您可通过提供的论文ID结合语义学者API(Semantic Scholar API,https://api.semanticscholar.org/)获取该论文的完整信息。
可用的标签包括:方法(Method)、背景(Background)、结果(Result)
### 支持的任务与基准排行榜
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 语言
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## 数据集结构
### 数据实例
#### 默认(default)
- **下载的数据集文件大小:** 23.19 MB
- **生成的数据集大小:** 5.15 MB
- **总磁盘占用空间:** 28.33 MB
以下是验证集(validation)的一个示例:
{
"citeEnd": 68,
"citeStart": 64,
"citedPaperId": "5e413c7872f5df231bf4a4f694504384560e98ca",
"citingPaperId": "8f1fbe460a901d994e9b81d69f77bfbe32719f4c",
"excerpt_index": 0,
"id": "8f1fbe460a901d994e9b81d69f77bfbe32719f4c>5e413c7872f5df231bf4a4f694504384560e98ca",
"isKeyCitation": false,
"label": 2,
"label2": 0,
"label2_confidence": 0.0,
"label_confidence": 0.0,
"sectionName": "Discussion",
"source": 4,
"string": "These results are in contrast with the findings of Santos et al.(16), who reported a significant association between low sedentary time and healthy CVF among Portuguese"
}
### 数据字段
所有数据集划分的数据字段均保持一致。
#### 默认(default)
- `string`:字符串类型特征。
- `sectionName`:字符串类型特征。
- `label`:分类标签,可选值包括`method`(0)、`background`(1)、`result`(2)。
- `citingPaperId`:字符串类型特征。
- `citedPaperId`:字符串类型特征。
- `excerpt_index`:int32类型特征。
- `isKeyCitation`:布尔类型特征。
- `label2`:分类标签,可选值包括`supportive`(0)、`not_supportive`(1)、`cant_determine`(2)、`none`(3)。
- `citeEnd`:int64类型特征。
- `citeStart`:int64类型特征。
- `source`:分类标签,可选值包括`properNoun`(0)、`andPhrase`(1)、`acronym`(2)、`etAlPhrase`(3)、`explicit`(4)。
- `label_confidence`:float32类型特征。
- `label2_confidence`:float32类型特征。
- `id`:字符串类型特征。
### 数据划分
| 数据集划分 | 训练集 | 验证集 | 测试集 |
|-------|----:|---------:|---:|
| default | 8194 | 916 | 1859 |
## 数据集构建
### 数据集构建依据
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 源数据
#### 初始数据收集与标准化
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
#### 源语言生产者是谁?
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 标注信息
#### 标注流程
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
#### 标注人员是谁?
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 个人与敏感信息
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## 数据集使用注意事项
### 数据集的社会影响
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 偏差讨论
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 其他已知局限性
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## 附加信息
### 数据集维护者
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 许可信息
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 引用信息
@inproceedings{cohan-etal-2019-structural,
title = "Structural Scaffolds for Citation Intent Classification in Scientific Publications",
author = "Cohan, Arman and
Ammar, Waleed and
van Zuylen, Madeleine and
Cady, Field",
booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
month = jun,
year = "2019",
address = "Minneapolis, Minnesota",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/N19-1361",
doi = "10.18653/v1/N19-1361",
pages = "3586--3596",
}
### 贡献致谢
感谢[@lewtun](https://github.com/lewtun)、[@patrickvonplaten](https://github.com/patrickvonplaten)、[@mariamabarham](https://github.com/mariamabarham)、[@thomwolf](https://github.com/thomwolf)为本数据集的添加所做的贡献。
提供机构:
maas
创建时间:
2025-05-27



