five

Corpus of Sentence-level Revisions in Academic Writing

收藏
arXiv2014-06-01 更新2024-06-21 收录
下载链接:
http://chenhaot.com/pages/statement-strength.html
下载链接
链接失效反馈
官方服务:
资源简介:
本数据集名为‘学术写作中的句子级修订语料库’,由康奈尔大学计算机科学系创建,旨在研究学术文本中陈述强度的变化。数据集包含2011年arXiv上的23000篇论文的修订记录,主要关注句子层面的修改,如删除、打字错误修正和重写。创建过程涉及从多版本论文中提取文本内容,并使用动态编程算法进行句子对齐。该数据集的应用领域包括科学传播和学术写作,旨在理解和评估陈述强度对读者理解的影响,以及如何通过修订改善信息的传达效果。

This dataset is named *Sentence-level Revision Corpus for Academic Writing*, developed by the Department of Computer Science at Cornell University, and it aims to investigate variations in statement strength within academic texts. The corpus contains revision records of 23,000 papers published on arXiv in 2011, with a primary focus on sentence-level modifications such as deletions, typo corrections, and rewrites. The creation process involved extracting text content from multiple versions of papers and performing sentence alignment using dynamic programming algorithms. Its application domains cover scientific communication and academic writing, with the goals of understanding and evaluating the impact of statement strength on reader comprehension, as well as how revisions can improve the effectiveness of information conveyance.
提供机构:
康奈尔大学
创建时间:
2014-05-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作