eLife Open Peer Review Corpus
收藏DataCite Commons2024-03-26 更新2025-04-16 收录
下载链接:
https://repod.icm.edu.pl/citation?persistentId=doi:10.18150/FKPEQN
下载链接
链接失效反馈官方服务:
资源简介:
eLife Open Peer Review CorpusSection for Logic & Cognitive Science, Institute of Philosophy and Sociology, Polish Academy of ScienceGenerated by Ksawery Jasieński under supervision of Marcin Miłkowski (2022)eLife is committed to open peer review idea. Reviews are not available for download in a single package, so they must be extracted from their complete data set, which is several gigabytes large.The data set contains all available peer reviews as of 24 July 2022 (10853 papers with reviews or at least decision letters that omit minor comments). As stated by eLife:“In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.”In addition, the corpus contains metadata about particular reviews (the filename contains ‘r’ and a numerical id before ‘.xml’), author responses (the filename then contains ‘a’ and a number before the ‘.xml’ suffix) and paper metadata in the JSON format. The JSON schema files are available in the schema subdirectory in files with self-explanatory names.The original files were not enriched with any linguistic annotation or converted (these are in XML format, as used by eLife).Additionally, we are making the crawler code available, in the review_crawler directory. For eLife reviews, elife_crawler.py should be used (more notes in the subdirectory).The files are being made available under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
波兰科学院哲学与社会学研究所逻辑与认知科学部eLife开放同行评审语料库,由Ksawery Jasieński在Marcin Miłkowski指导下于2022年生成。
eLife秉持开放同行评审理念,但其评审内容无法通过单一打包文件下载,需从数GB大小的完整数据集中提取。
本数据集包含截至2022年7月24日的所有可用同行评审数据,涉及10853篇带有同行评审记录或至少包含省略次要意见的编辑决策函的论文。正如eLife所述:"为保障透明度,eLife会刊载编辑决策函及配套的作者回复。本文展示的是同行评审流程结束后发送给作者的信函的轻度编辑版本,仅标注最核心的实质性意见,通常不包含次要意见。"
此外,该语料库还包含各类相关元数据:特定评审的元数据(文件名在.xml后缀前包含'r'与数字ID)、作者回复元数据(文件名在.xml后缀前包含'a'与数字),以及JSON格式的论文元数据。JSON Schema文件存放在schema子目录中,文件名含义清晰直观。
原始文件未添加任何语言标注,也未进行格式转换,保留了eLife原生的XML格式。
项目同时在review_crawler目录中提供了爬虫代码:针对eLife评审数据集,应使用elife_crawler.py(子目录内附有更多说明)。
本数据集采用知识共享署名(Creative Commons Attribution,CC BY)许可协议进行发布,具体条款参见https://creativecommons.org/licenses/by/4.0/。
提供机构:
RepOD
创建时间:
2022-07-28



