five

RST-DT Dataset

收藏
paperswithcode.com2025-01-21 收录
下载链接:
https://paperswithcode.com/dataset/rst-dt
下载链接
链接失效反馈
官方服务:
资源简介:
The Rhetorical Structure Theory (RST) Discourse Treebank consists of 385 Wall Street Journal articles from the Penn Treebank annotated with discourse structure in the RST framework along with human-generated extracts and abstracts associated with the source documents. In the RST framework (Mann and Thompson, 1988), a text's discourse structure can be represented as a tree in four aspects: (1) the leaves correspond to text fragments called elementary discourse units (the mininal discourse units); (2) the internal nodes of the tree correspond to contiguous text spans; (3) each node is characterized by its nuclearity, or essential unit of information; and (4) each node is also characterized by a rhetorical relation between two or more non-overlapping, adjacent text spans. Data The data in this release is divided into a training set (347 documents) and a test set (38 documents). All annotations were produced using a discourse annotation tool that can be downloaded from http://www.isi.edu/~marcu/discourse.

修辞结构理论(Rhetorical Structure Theory,RST)语篇树库由385篇《华尔街日报》文章构成,这些文章来自宾夕法尼亚树库(Penn Treebank),并已在RST框架下标注了语篇结构,同时附带了与源文档相关的人造摘录和摘要。 在RST框架(Mann和Thompson,1988年提出)中,文本的语篇结构可以从四个方面表示为树形结构:(1)叶节点对应被称为基本语篇单元(最小编语单元)的文本片段;(2)树的内节点对应连续的文本跨度;(3)每个节点以其核性或基本信息单元为特征;(4)每个节点还以两个或更多非重叠、相邻文本跨度之间的修辞关系为特征。 数据 本发行版中的数据分为训练集(347个文档)和测试集(38个文档)。所有标注均使用可从http://www.isi.edu/~marcu/discourse下载的语篇标注工具生成。
提供机构:
Papers with Code
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作