five

Discourse Segment Type vs. Linguistic Features

收藏
doi.org2025-03-24 收录
下载链接:
http://doi.org/10.17632/4bh33fdx4v.3
下载链接
链接失效反馈
官方服务:
资源简介:
1. Ten full-text papers in biology were annotated, see 170220_deWaard_Corpus for full references. The papers were selected according to three criteria: 1.1. Papers related to the Voorhoeve paper (Voorhoeve). (*) 1.2. Papers regarding neuropharmacology (Neuro). (**) 1.3. Papers from the Genia corpus (Genia). (***) 2. The papers were obtained by downloading the html and converted into text and then copied into an Excel spreadsheet. 3. Each paper was annotated as follows: 3.1. The first letter of the first author name was added (column 1) 3.2. The papers were (manually) split into discourse segments, as described in [2] 3.3. The section names were added; 3.4. Segment types were identified, according to the categories defined in [2]; 3.5. Verb tense/modality/voice was annotated, according to the categories defined in [2]; 3.6. Verb class was added from a taxonomy described in [3]; 3.7. Modality features were added according to categories described in [4]; 4. The final results with the text enclosed can be found in the file 170220_deWaard_DST_With_Text 5. The final results with only numerical results, for ease of statistical processing, can be found in the files 170220_deWaard_DST_Codes 6. The CodeBook describing the map of the numerical results to the values can be found in the file 170220_deWaard_Value_Labels [2] de Waard, A. and Pander Maat, H. (2009). Categorizing Epistemic Segment Types in Biology Research Articles. In Proceedings of the Workshop on Linguistic and Psycholinguistic Approaches to Text Structuring (LPTS 2009) [3] de Waard , Anita & Pander Maat, Henk. (2010). A classification of research verbs to facilitate discourse segment identification in biological texts. Proceedings from The Interdisciplinary Workshop on Verbs. The identification and representation of verb features. Pisa, Italy [4] de Waard, A. and Pander Maat, H. (2012). Knowledge Attribution in Scientific Discourse: A Taxonomy of Types and Overview of Features, In Proceedings of the Workshop on Detecting Structure in Scholarly Discourse (DSDD), ACL 2012

本数据集包含生物学领域的十篇全文论文,具体参考文献请参见170220_deWaard_Corpus。论文选取遵循以下三个标准: 1.1. 与Voorhoeve论文(Voorhoeve)相关的论文;(*) 1.2. 涉及神经药理学的论文(Neuro);(**) 1.3. 来自Genia语料库的论文。(***) 2. 论文通过下载html格式并转换为文本,随后复制至Excel电子表格中获取。 3. 每篇论文的标注方式如下: 3.1. 在第一作者姓名的首字母处添加(列1); 3.2. 按照文献[2]所述,将论文(人工)分割为语篇段落; 3.3. 添加章节名称; 3.4. 根据文献[2]中定义的类别识别段落类型; 3.5. 根据文献[2]中定义的类别标注动词的时态/情态/语态; 3.6. 从文献[3]中描述的分类学中添加动词类别; 3.7. 根据文献[4]中描述的类别添加情态特征。 4. 包含文本的最终结果可在文件170220_deWaard_DST_With_Text中找到。 5. 为了便于统计处理,仅包含数值结果的最终结果可在文件170220_deWaard_DST_Codes中找到。 6. 描述数值结果与值映射关系的CodeBook可在文件170220_deWaard_Value_Labels中查到。 [2] de Waard, A. and Pander Maat, H. (2009). Categorizing Epistemic Segment Types in Biology Research Articles. In Proceedings of the Workshop on Linguistic and Psycholinguistic Approaches to Text Structuring (LPTS 2009) [3] de Waard, Anita & Pander Maat, Henk. (2010). A classification of research verbs to facilitate discourse segment identification in biological texts. Proceedings from The Interdisciplinary Workshop on Verbs. The identification and representation of verb features. Pisa, Italy [4] de Waard, A. and Pander Maat, H. (2012). Knowledge Attribution in Scientific Discourse: A Taxonomy of Types and Overview of Features, In Proceedings of the Workshop on Detecting Structure in Scholarly Discourse (DSDD), ACL 2012
提供机构:
doi.org
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作