RST Signalling Corpus
收藏DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2015T10
下载链接
链接失效反馈官方服务:
资源简介:
<h3>Introduction</h3><br>
<p>RST Signalling Corpus was developed at Simon Fraser University and contains annotations for signalling information added to RST Discourse Treebank (<a href="../../../LDC2002T07">LDC2002T07</a>). RST Discourse Treebank (RST-DT) is a collection of English news texts annotated for rhetorical relations under the RST (Rhetorical Structure Theory) framework. In RST Signalling Corpus, information about textual signals -- such as <em>although</em>, <em>because,</em> <em>thus</em> -- and signals such as tense, lexical chains or punctuation were added as an annotation layer to examine how rhetorical relations are signalled in discourse.</p><br>
<h3>Data</h3><br>
<p>The source data consists of 385 Wall Street Journal news articles from the <a href="../../../LDC99T42"> Penn Treebank</a> annotated for rhetorical relations in RST Discourse Treebank. As in RST-DT, the data in this release is divided into a training set (347 articles) and a test set (38 articles).</p><br>
<p>The signalling annotation in this data set was performed using the <a href="http://www.wagsoft.com/CorpusTool/">UAM CorpusTool </a> version 2.8.12. Files are presented as UTF-8 encoded XML and plain text. The corpus is divided into three annotation sub-directories: training, test and full. All sub-directories include source, metadata, signalling annotation, and dtd files.</p><br>
<h3>Samples</h3><br>
<p>Please view the following samples:</p><br>
<ul><br>
<li><a href="desc/addenda/LDC2015T10.metadata.xml">Metadata Sample</a></li><br>
<li><a href="desc/addenda/LDC2015T10.signal.xml">Signal Sample</a></li><br>
<li><a href="desc/addenda/LDC2015T10.txt">Text Sample</a></li><br>
</ul><br>
<h3>Updates</h3><br>
<p>None at this time.</p></br>
Portions © 1987-1989 Dow Jones & Company, Inc., © 2015 Depobam Das, © 2015 Maite Taboada, © 1995, 1999, 2002, 2015 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30



