five

peer_read

收藏
魔搭社区2025-07-11 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/allenai/peer_read
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for peer_read ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** https://arxiv.org/abs/1804.09635 - **Repository:** https://github.com/allenai/PeerRead - **Paper:** https://arxiv.org/pdf/1804.09635.pdf - **Leaderboard:** [Needs More Information] - **Point of Contact:** [Needs More Information] ### Dataset Summary PearRead is a dataset of scientific peer reviews available to help researchers study this important artifact. The dataset consists of over 14K paper drafts and the corresponding accept/reject decisions in top-tier venues including ACL, NIPS and ICLR, as well as over 10K textual peer reviews written by experts for a subset of the papers. ### Supported Tasks and Leaderboards [More Information Needed] ### Languages en-English ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields #### parsed_pdfs - `name`: `string` Filename in the dataset - `metadata`: `dict` Paper metadata - `source`: `string` Paper source - `authors`: `list` List of paper authors - `title`: `string` Paper title - `sections`: `list` List of section heading and corresponding description - `heading`: `string` Section heading - `text`: `string` Section description - `references`: `string` List of references - `title`: `string` Title of reference paper - `author`: `list` List of reference paper authors - `venue`: `string` Reference venue - `citeRegEx`: `string` Reference citeRegEx - `shortCiteRegEx`: `string` Reference shortCiteRegEx - `year`: `int` Reference publish year - `referenceMentions`: `list` List of reference mentions - `referenceID`: `int` Reference mention ID - `context`: `string` Reference mention context - `startOffset`: `int` Reference startOffset - `endOffset`: `int` Reference endOffset - `year`: `int` Paper publish year - `abstractText`: `string` Paper abstract - `creator`: `string` Paper creator #### reviews - `id`: `int` Review ID - `conference`: `string` Conference name - `comments`: `string` Review comments - `subjects`: `string` Review subjects - `version`: `string` Review version - `date_of_submission`: `string` Submission date - `title`: `string` Paper title - `authors`: `list` List of paper authors - `accepted`: `bool` Paper accepted flag - `abstract`: `string` Paper abstract - `histories`: `list` Paper details with link - `reviews`: `dict` Paper reviews - `date`: `string` Date of review - `title`: `string` Paper title - `other_keys`: `string` Reviewer other details - `originality`: `string` Originality score - `comments`: `string` Reviewer comments - `is_meta_review`: `bool` Review type flag - `recommendation`: `string` Reviewer recommendation - `replicability`: `string` Replicability score - `presentation_format`: `string` Presentation type - `clarity`: `string` Clarity score - `meaningful_comparison`: `string` Meaningful comparison score - `substance`: `string` Substance score - `reviewer_confidence`: `string` Reviewer confidence score - `soundness_correctness`: `string` Soundness correctness score - `appropriateness`: `string` Appropriateness score - `impact`: `string` Impact score ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators Dongyeop Kang, Waleed Ammar, Bhavana Dalvi Mishra, Madeleine van Zuylen, Sebastian Kohlmeier, Eduard Hovy, Roy Schwartz ### Licensing Information [More Information Needed] ### Citation Information @inproceedings{kang18naacl, title = {A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications}, author = {Dongyeop Kang and Waleed Ammar and Bhavana Dalvi and Madeleine van Zuylen and Sebastian Kohlmeier and Eduard Hovy and Roy Schwartz}, booktitle = {Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL)}, address = {New Orleans, USA}, month = {June}, url = {https://arxiv.org/abs/1804.09635}, year = {2018} } ### Contributions Thanks to [@vinaykudari](https://github.com/vinaykudari) for adding this dataset.

# peer_read 数据集卡片 ## 目录 - [数据集描述](#dataset-description) - [数据集概述](#dataset-summary) - [支持任务与排行榜](#supported-tasks-and-leaderboards) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建](#dataset-creation) - [数据集遴选依据](#curation-rationale) - [源数据](#source-data) - [标注信息](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据集使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏差讨论](#discussion-of-biases) - [其他已知局限性](#other-known-limitations) - [附加信息](#additional-information) - [数据集遴选者](#dataset-curators) - [授权信息](#licensing-information) - [引用信息](#citation-information) - [贡献者](#contributions) ## 数据集描述 - **主页**:https://arxiv.org/abs/1804.09635 - **代码仓库**:https://github.com/allenai/PeerRead - **相关论文**:https://arxiv.org/pdf/1804.09635.pdf - **排行榜**:[需补充更多信息] - **联系方式**:[需补充更多信息] ### 数据集概述 PeerRead是一套收录学术同行评审(peer review)的数据集,旨在助力研究者开展此类重要研究对象的相关研究。本数据集包含超过1.4万份论文稿件,以及ACL、NIPS、ICLR等顶级学术会议的稿件录用/拒稿决策,同时还包含超过1万条由专家为部分稿件撰写的文本形式同行评审意见。 ### 支持任务与排行榜 [需补充更多信息] ### 语言 英语(en-English) ## 数据集结构 ### 数据实例 [需补充更多信息] ### 数据字段 #### 解析后的PDF(parsed_pdfs) - `name`: `string` 数据集中的文件名 - `metadata`: `dict` 论文元数据 - `source`: `string` 论文来源 - `authors`: `list` 论文作者列表 - `title`: `string` 论文标题 - `sections`: `list` 章节标题与对应内容列表 - `heading`: `string` 章节标题 - `text`: `string` 章节内容 - `references`: `list` 参考文献条目列表 - `title`: `string` 参考文献论文标题 - `author`: `list` 参考文献论文作者列表 - `venue`: `string` 参考文献发表会议/期刊 - `citeRegEx`: `string` 参考文献引用正则表达式 - `shortCiteRegEx`: `string` 参考文献短引用正则表达式 - `year`: `int` 参考文献发表年份 - `referenceMentions`: `list` 参考文献引用提及列表 - `referenceID`: `int` 引用提及的参考文献ID - `context`: `string` 引用提及上下文 - `startOffset`: `int` 引用起始偏移量 - `endOffset`: `int` 引用结束偏移量 - `year`: `int` 论文发表年份 - `abstractText`: `string` 论文摘要 - `creator`: `string` 论文创建者 #### 评审信息(reviews) - `id`: `int` 评审ID - `conference`: `string` 会议名称 - `comments`: `string` 评审意见 - `subjects`: `string` 评审主题 - `version`: `string` 评审版本 - `date_of_submission`: `string` 投稿日期 - `title`: `string` 论文标题 - `authors`: `list` 论文作者列表 - `accepted`: `bool` 论文录用标记 - `abstract`: `string` 论文摘要 - `histories`: `list` 含链接的论文详情 - `reviews`: `dict` 论文评审详情 - `date`: `string` 评审日期 - `title`: `string` 论文标题 - `other_keys`: `string` 评审者其他信息 - `originality`: `string` 创新性评分 - `comments`: `string` 评审者意见 - `is_meta_review`: `bool` 元评审标记 - `recommendation`: `string` 评审者推荐意见 - `replicability`: `string` 可复现性评分 - `presentation_format`: `string` 展示形式 - `clarity`: `string` 清晰性评分 - `meaningful_comparison`: `string` 有效对比评分 - `substance`: `string` 研究内容实质性评分 - `reviewer_confidence`: `string` 评审者置信度评分 - `soundness_correctness`: `string` 严谨性与正确性评分 - `appropriateness`: `string` 适配性评分 - `impact`: `string` 影响力评分 ### 数据划分 [需补充更多信息] ## 数据集构建 ### 数据集遴选依据 [需补充更多信息] ### 源数据 #### 初始数据收集与规范化 [需补充更多信息] #### 源语言生成者是谁? [需补充更多信息] ### 标注信息 #### 标注流程 [需补充更多信息] #### 标注者是谁? [需补充更多信息] ### 个人与敏感信息 [需补充更多信息] ## 数据集使用注意事项 ### 数据集的社会影响 [需补充更多信息] ### 偏差讨论 [需补充更多信息] ### 其他已知局限性 [需补充更多信息] ## 附加信息 ### 数据集遴选者 Dongyeop Kang、Waleed Ammar、Bhavana Dalvi Mishra、Madeleine van Zuylen、Sebastian Kohlmeier、Eduard Hovy、Roy Schwartz ### 授权信息 [需补充更多信息] ### 引用信息 @inproceedings{kang18naacl, title = {同行评审数据集(PeerRead):收集、研究洞察与自然语言处理应用}, author = {Dongyeop Kang and Waleed Ammar and Bhavana Dalvi and Madeleine van Zuylen and Sebastian Kohlmeier and Eduard Hovy and Roy Schwartz}, booktitle = {北美计算语言学协会(NAACL)年会}, address = {美国新奥尔良}, month = {6月}, url = {https://arxiv.org/abs/1804.09635}, year = {2018} } ### 贡献者 感谢[@vinaykudari](https://github.com/vinaykudari) 为本数据集的收录提供帮助。
提供机构:
maas
创建时间:
2025-05-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作