Discourse Segment Type vs. Linguistic Features
收藏doi.org2025-03-24 收录
下载链接:
http://doi.org/10.17632/4bh33fdx4v.3
下载链接
链接失效反馈官方服务:
资源简介:
1. Ten full-text papers in biology were annotated, see 170220_deWaard_Corpus for full references. The papers were selected according to three criteria:
1.1. Papers related to the Voorhoeve paper (Voorhoeve). (*)
1.2. Papers regarding neuropharmacology (Neuro). (**)
1.3. Papers from the Genia corpus (Genia). (***)
2. The papers were obtained by downloading the html and converted into text and then copied into an Excel spreadsheet.
3. Each paper was annotated as follows:
3.1. The first letter of the first author name was added (column 1)
3.2. The papers were (manually) split into discourse segments, as described in [2]
3.3. The section names were added;
3.4. Segment types were identified, according to the categories defined in [2];
3.5. Verb tense/modality/voice was annotated, according to the categories defined in [2];
3.6. Verb class was added from a taxonomy described in [3];
3.7. Modality features were added according to categories described in [4];
4. The final results with the text enclosed can be found in the file 170220_deWaard_DST_With_Text
5. The final results with only numerical results, for ease of statistical processing, can be found in the files 170220_deWaard_DST_Codes
6. The CodeBook describing the map of the numerical results to the values can be found in the file 170220_deWaard_Value_Labels
[2] de Waard, A. and Pander Maat, H. (2009). Categorizing Epistemic Segment Types in Biology Research Articles. In Proceedings of the Workshop on Linguistic and Psycholinguistic Approaches to Text Structuring (LPTS 2009)
[3] de Waard , Anita & Pander Maat, Henk. (2010). A classification of research verbs to facilitate discourse segment identification in biological texts. Proceedings from The Interdisciplinary Workshop on Verbs. The identification and representation of verb features. Pisa, Italy
[4] de Waard, A. and Pander Maat, H. (2012). Knowledge Attribution in Scientific Discourse: A Taxonomy of Types and Overview of Features, In Proceedings of the Workshop on Detecting Structure in Scholarly Discourse (DSDD), ACL 2012
本数据集包含生物学领域的十篇全文论文,具体参考文献请参见170220_deWaard_Corpus。论文选取遵循以下三个标准:
1.1. 与Voorhoeve论文(Voorhoeve)相关的论文;(*)
1.2. 涉及神经药理学的论文(Neuro);(**)
1.3. 来自Genia语料库的论文。(***)
2. 论文通过下载html格式并转换为文本,随后复制至Excel电子表格中获取。
3. 每篇论文的标注方式如下:
3.1. 在第一作者姓名的首字母处添加(列1);
3.2. 按照文献[2]所述,将论文(人工)分割为语篇段落;
3.3. 添加章节名称;
3.4. 根据文献[2]中定义的类别识别段落类型;
3.5. 根据文献[2]中定义的类别标注动词的时态/情态/语态;
3.6. 从文献[3]中描述的分类学中添加动词类别;
3.7. 根据文献[4]中描述的类别添加情态特征。
4. 包含文本的最终结果可在文件170220_deWaard_DST_With_Text中找到。
5. 为了便于统计处理,仅包含数值结果的最终结果可在文件170220_deWaard_DST_Codes中找到。
6. 描述数值结果与值映射关系的CodeBook可在文件170220_deWaard_Value_Labels中查到。
[2] de Waard, A. and Pander Maat, H. (2009). Categorizing Epistemic Segment Types in Biology Research Articles. In Proceedings of the Workshop on Linguistic and Psycholinguistic Approaches to Text Structuring (LPTS 2009)
[3] de Waard, Anita & Pander Maat, Henk. (2010). A classification of research verbs to facilitate discourse segment identification in biological texts. Proceedings from The Interdisciplinary Workshop on Verbs. The identification and representation of verb features. Pisa, Italy
[4] de Waard, A. and Pander Maat, H. (2012). Knowledge Attribution in Scientific Discourse: A Taxonomy of Types and Overview of Features, In Proceedings of the Workshop on Detecting Structure in Scholarly Discourse (DSDD), ACL 2012
提供机构:
doi.org



