RST-DT Dataset
收藏paperswithcode.com2025-01-21 收录
下载链接:
https://paperswithcode.com/dataset/rst-dt
下载链接
链接失效反馈官方服务:
资源简介:
The Rhetorical Structure Theory (RST) Discourse Treebank consists of 385 Wall Street Journal articles
from the Penn Treebank annotated with discourse structure in the RST framework along with
human-generated extracts and abstracts associated with the source documents.
In the RST framework (Mann and Thompson, 1988), a text's discourse structure can be
represented as a tree in four aspects:
(1) the leaves correspond to text fragments called elementary discourse units (the mininal discourse units);
(2) the internal nodes of the tree correspond to contiguous text spans;
(3) each node is characterized by its nuclearity, or essential unit of information; and
(4) each node is also characterized by a rhetorical relation between two or more non-overlapping, adjacent text spans.
Data
The data in this release is divided into a training set (347 documents) and a test set (38 documents).
All annotations were produced using a discourse annotation tool that can be downloaded from http://www.isi.edu/~marcu/discourse.
修辞结构理论(Rhetorical Structure Theory,RST)语篇树库由385篇《华尔街日报》文章构成,这些文章来自宾夕法尼亚树库(Penn Treebank),并已在RST框架下标注了语篇结构,同时附带了与源文档相关的人造摘录和摘要。
在RST框架(Mann和Thompson,1988年提出)中,文本的语篇结构可以从四个方面表示为树形结构:(1)叶节点对应被称为基本语篇单元(最小编语单元)的文本片段;(2)树的内节点对应连续的文本跨度;(3)每个节点以其核性或基本信息单元为特征;(4)每个节点还以两个或更多非重叠、相邻文本跨度之间的修辞关系为特征。
数据
本发行版中的数据分为训练集(347个文档)和测试集(38个文档)。所有标注均使用可从http://www.isi.edu/~marcu/discourse下载的语篇标注工具生成。
提供机构:
Papers with Code



