peer_read
收藏魔搭社区2025-07-11 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/allenai/peer_read
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for peer_read
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** https://arxiv.org/abs/1804.09635
- **Repository:** https://github.com/allenai/PeerRead
- **Paper:** https://arxiv.org/pdf/1804.09635.pdf
- **Leaderboard:** [Needs More Information]
- **Point of Contact:** [Needs More Information]
### Dataset Summary
PearRead is a dataset of scientific peer reviews available to help researchers study this important artifact. The dataset consists of over 14K paper drafts and the corresponding accept/reject decisions in top-tier venues including ACL, NIPS and ICLR, as well as over 10K textual peer reviews written by experts for a subset of the papers.
### Supported Tasks and Leaderboards
[More Information Needed]
### Languages
en-English
## Dataset Structure
### Data Instances
[More Information Needed]
### Data Fields
#### parsed_pdfs
- `name`: `string` Filename in the dataset
- `metadata`: `dict` Paper metadata
- `source`: `string` Paper source
- `authors`: `list` List of paper authors
- `title`: `string` Paper title
- `sections`: `list` List of section heading and corresponding description
- `heading`: `string` Section heading
- `text`: `string` Section description
- `references`: `string` List of references
- `title`: `string` Title of reference paper
- `author`: `list` List of reference paper authors
- `venue`: `string` Reference venue
- `citeRegEx`: `string` Reference citeRegEx
- `shortCiteRegEx`: `string` Reference shortCiteRegEx
- `year`: `int` Reference publish year
- `referenceMentions`: `list` List of reference mentions
- `referenceID`: `int` Reference mention ID
- `context`: `string` Reference mention context
- `startOffset`: `int` Reference startOffset
- `endOffset`: `int` Reference endOffset
- `year`: `int` Paper publish year
- `abstractText`: `string` Paper abstract
- `creator`: `string` Paper creator
#### reviews
- `id`: `int` Review ID
- `conference`: `string` Conference name
- `comments`: `string` Review comments
- `subjects`: `string` Review subjects
- `version`: `string` Review version
- `date_of_submission`: `string` Submission date
- `title`: `string` Paper title
- `authors`: `list` List of paper authors
- `accepted`: `bool` Paper accepted flag
- `abstract`: `string` Paper abstract
- `histories`: `list` Paper details with link
- `reviews`: `dict` Paper reviews
- `date`: `string` Date of review
- `title`: `string` Paper title
- `other_keys`: `string` Reviewer other details
- `originality`: `string` Originality score
- `comments`: `string` Reviewer comments
- `is_meta_review`: `bool` Review type flag
- `recommendation`: `string` Reviewer recommendation
- `replicability`: `string` Replicability score
- `presentation_format`: `string` Presentation type
- `clarity`: `string` Clarity score
- `meaningful_comparison`: `string` Meaningful comparison score
- `substance`: `string` Substance score
- `reviewer_confidence`: `string` Reviewer confidence score
- `soundness_correctness`: `string` Soundness correctness score
- `appropriateness`: `string` Appropriateness score
- `impact`: `string` Impact score
### Data Splits
[More Information Needed]
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
Dongyeop Kang, Waleed Ammar, Bhavana Dalvi Mishra, Madeleine van Zuylen, Sebastian Kohlmeier, Eduard Hovy, Roy Schwartz
### Licensing Information
[More Information Needed]
### Citation Information
@inproceedings{kang18naacl,
title = {A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications},
author = {Dongyeop Kang and Waleed Ammar and Bhavana Dalvi and Madeleine van Zuylen and Sebastian Kohlmeier and Eduard Hovy and Roy Schwartz},
booktitle = {Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL)},
address = {New Orleans, USA},
month = {June},
url = {https://arxiv.org/abs/1804.09635},
year = {2018}
}
### Contributions
Thanks to [@vinaykudari](https://github.com/vinaykudari) for adding this dataset.
# peer_read 数据集卡片
## 目录
- [数据集描述](#dataset-description)
- [数据集概述](#dataset-summary)
- [支持任务与排行榜](#supported-tasks-and-leaderboards)
- [语言](#languages)
- [数据集结构](#dataset-structure)
- [数据实例](#data-instances)
- [数据字段](#data-fields)
- [数据划分](#data-splits)
- [数据集构建](#dataset-creation)
- [数据集遴选依据](#curation-rationale)
- [源数据](#source-data)
- [标注信息](#annotations)
- [个人与敏感信息](#personal-and-sensitive-information)
- [数据集使用注意事项](#considerations-for-using-the-data)
- [数据集的社会影响](#social-impact-of-dataset)
- [偏差讨论](#discussion-of-biases)
- [其他已知局限性](#other-known-limitations)
- [附加信息](#additional-information)
- [数据集遴选者](#dataset-curators)
- [授权信息](#licensing-information)
- [引用信息](#citation-information)
- [贡献者](#contributions)
## 数据集描述
- **主页**:https://arxiv.org/abs/1804.09635
- **代码仓库**:https://github.com/allenai/PeerRead
- **相关论文**:https://arxiv.org/pdf/1804.09635.pdf
- **排行榜**:[需补充更多信息]
- **联系方式**:[需补充更多信息]
### 数据集概述
PeerRead是一套收录学术同行评审(peer review)的数据集,旨在助力研究者开展此类重要研究对象的相关研究。本数据集包含超过1.4万份论文稿件,以及ACL、NIPS、ICLR等顶级学术会议的稿件录用/拒稿决策,同时还包含超过1万条由专家为部分稿件撰写的文本形式同行评审意见。
### 支持任务与排行榜
[需补充更多信息]
### 语言
英语(en-English)
## 数据集结构
### 数据实例
[需补充更多信息]
### 数据字段
#### 解析后的PDF(parsed_pdfs)
- `name`: `string` 数据集中的文件名
- `metadata`: `dict` 论文元数据
- `source`: `string` 论文来源
- `authors`: `list` 论文作者列表
- `title`: `string` 论文标题
- `sections`: `list` 章节标题与对应内容列表
- `heading`: `string` 章节标题
- `text`: `string` 章节内容
- `references`: `list` 参考文献条目列表
- `title`: `string` 参考文献论文标题
- `author`: `list` 参考文献论文作者列表
- `venue`: `string` 参考文献发表会议/期刊
- `citeRegEx`: `string` 参考文献引用正则表达式
- `shortCiteRegEx`: `string` 参考文献短引用正则表达式
- `year`: `int` 参考文献发表年份
- `referenceMentions`: `list` 参考文献引用提及列表
- `referenceID`: `int` 引用提及的参考文献ID
- `context`: `string` 引用提及上下文
- `startOffset`: `int` 引用起始偏移量
- `endOffset`: `int` 引用结束偏移量
- `year`: `int` 论文发表年份
- `abstractText`: `string` 论文摘要
- `creator`: `string` 论文创建者
#### 评审信息(reviews)
- `id`: `int` 评审ID
- `conference`: `string` 会议名称
- `comments`: `string` 评审意见
- `subjects`: `string` 评审主题
- `version`: `string` 评审版本
- `date_of_submission`: `string` 投稿日期
- `title`: `string` 论文标题
- `authors`: `list` 论文作者列表
- `accepted`: `bool` 论文录用标记
- `abstract`: `string` 论文摘要
- `histories`: `list` 含链接的论文详情
- `reviews`: `dict` 论文评审详情
- `date`: `string` 评审日期
- `title`: `string` 论文标题
- `other_keys`: `string` 评审者其他信息
- `originality`: `string` 创新性评分
- `comments`: `string` 评审者意见
- `is_meta_review`: `bool` 元评审标记
- `recommendation`: `string` 评审者推荐意见
- `replicability`: `string` 可复现性评分
- `presentation_format`: `string` 展示形式
- `clarity`: `string` 清晰性评分
- `meaningful_comparison`: `string` 有效对比评分
- `substance`: `string` 研究内容实质性评分
- `reviewer_confidence`: `string` 评审者置信度评分
- `soundness_correctness`: `string` 严谨性与正确性评分
- `appropriateness`: `string` 适配性评分
- `impact`: `string` 影响力评分
### 数据划分
[需补充更多信息]
## 数据集构建
### 数据集遴选依据
[需补充更多信息]
### 源数据
#### 初始数据收集与规范化
[需补充更多信息]
#### 源语言生成者是谁?
[需补充更多信息]
### 标注信息
#### 标注流程
[需补充更多信息]
#### 标注者是谁?
[需补充更多信息]
### 个人与敏感信息
[需补充更多信息]
## 数据集使用注意事项
### 数据集的社会影响
[需补充更多信息]
### 偏差讨论
[需补充更多信息]
### 其他已知局限性
[需补充更多信息]
## 附加信息
### 数据集遴选者
Dongyeop Kang、Waleed Ammar、Bhavana Dalvi Mishra、Madeleine van Zuylen、Sebastian Kohlmeier、Eduard Hovy、Roy Schwartz
### 授权信息
[需补充更多信息]
### 引用信息
@inproceedings{kang18naacl,
title = {同行评审数据集(PeerRead):收集、研究洞察与自然语言处理应用},
author = {Dongyeop Kang and Waleed Ammar and Bhavana Dalvi and Madeleine van Zuylen and Sebastian Kohlmeier and Eduard Hovy and Roy Schwartz},
booktitle = {北美计算语言学协会(NAACL)年会},
address = {美国新奥尔良},
month = {6月},
url = {https://arxiv.org/abs/1804.09635},
year = {2018}
}
### 贡献者
感谢[@vinaykudari](https://github.com/vinaykudari) 为本数据集的收录提供帮助。
提供机构:
maas
创建时间:
2025-05-27



