RST Signalling Corpus

Name: RST Signalling Corpus
Creator: Linguistic Data Consortium
Published: 2021-07-01 16:27:47
License: 暂无描述

DataCite Commons2021-07-01 更新2025-04-16 收录

下载链接：

https://catalog.ldc.upenn.edu/LDC2015T10

下载链接

链接失效反馈

官方服务：

资源简介：

<h3>Introduction</h3><br> <p>RST Signalling Corpus was developed at Simon Fraser University and contains annotations for signalling information added to RST Discourse Treebank (<a href="../../../LDC2002T07">LDC2002T07</a>). RST Discourse Treebank (RST-DT) is a collection of English news texts annotated for rhetorical relations under the RST (Rhetorical Structure Theory) framework. In RST Signalling Corpus, information about textual signals -- such as <em>although</em>, <em>because,</em> <em>thus</em> -- and signals such as tense, lexical chains or punctuation were added as an annotation layer to examine how rhetorical relations are signalled in discourse.</p><br> <h3>Data</h3><br> <p>The source data consists of 385 Wall Street Journal news articles from the <a href="../../../LDC99T42"> Penn Treebank</a> annotated for rhetorical relations in RST Discourse Treebank. As in RST-DT, the data in this release is divided into a training set (347 articles) and a test set (38 articles).</p><br> <p>The signalling annotation in this data set was performed using the <a href="http://www.wagsoft.com/CorpusTool/">UAM CorpusTool </a> version 2.8.12. Files are presented as UTF-8 encoded XML and plain text. The corpus is divided into three annotation sub-directories: training, test and full. All sub-directories include source, metadata, signalling annotation, and dtd files.</p><br> <h3>Samples</h3><br> <p>Please view the following samples:</p><br> <ul><br> <li><a href="desc/addenda/LDC2015T10.metadata.xml">Metadata Sample</a></li><br> <li><a href="desc/addenda/LDC2015T10.signal.xml">Signal Sample</a></li><br> <li><a href="desc/addenda/LDC2015T10.txt">Text Sample</a></li><br> </ul><br> <h3>Updates</h3><br> <p>None at this time.</p></br> Portions © 1987-1989 Dow Jones & Company, Inc., © 2015 Depobam Das, © 2015 Maite Taboada, © 1995, 1999, 2002, 2015 Trustees of the University of Pennsylvania

提供机构：

Linguistic Data Consortium

创建时间：

2020-11-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集