five

RST Signalling Corpus

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2015T10
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>RST Signalling Corpus was developed at Simon Fraser University and contains annotations for signalling information added to RST Discourse Treebank (<a href="../../../LDC2002T07">LDC2002T07</a>). RST Discourse Treebank (RST-DT) is a collection of English news texts annotated for rhetorical relations under the RST (Rhetorical Structure Theory) framework. In RST Signalling Corpus, information about textual signals -- such as <em>although</em>, <em>because,</em> <em>thus</em> -- and signals such as tense, lexical chains or punctuation were added as an annotation layer to examine how rhetorical relations are signalled in discourse.</p><br> <h3>Data</h3><br> <p>The source data consists of 385 Wall Street Journal news articles from the <a href="../../../LDC99T42"> Penn Treebank</a> annotated for rhetorical relations in RST Discourse Treebank. As in RST-DT, the data in this release is divided into a training set (347 articles) and a test set (38 articles).</p><br> <p>The signalling annotation in this data set was performed using the <a href="http://www.wagsoft.com/CorpusTool/">UAM CorpusTool </a> version 2.8.12. Files are presented as UTF-8 encoded XML and plain text. The corpus is divided into three annotation sub-directories: training, test and full. All sub-directories include source, metadata, signalling annotation, and dtd files.</p><br> <h3>Samples</h3><br> <p>Please view the following samples:</p><br> <ul><br> <li><a href="desc/addenda/LDC2015T10.metadata.xml">Metadata Sample</a></li><br> <li><a href="desc/addenda/LDC2015T10.signal.xml">Signal Sample</a></li><br> <li><a href="desc/addenda/LDC2015T10.txt">Text Sample</a></li><br> </ul><br> <h3>Updates</h3><br> <p>None at this time.</p></br> Portions © 1987-1989 Dow Jones & Company, Inc., © 2015 Depobam Das, © 2015 Maite Taboada, © 1995, 1999, 2002, 2015 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作