GUM(Georgetown University Multilayer corpus)

Name: GUM(Georgetown University Multilayer corpus)
Creator: OpenDataLab
Published: 2026-05-24 10:30:26
License: 暂无描述

OpenDataLab2026-05-24 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/GUM

下载链接

链接失效反馈

官方服务：

资源简介：

GUM 是一个开源的多层英语语料库，包含来自 12 种文本类型的丰富注释文本。注释包括：多个 POS 标签、形态特征和词形还原句子分割和粗略的言语行为 TEI XML 中的文档结构（段落、标题、图形等） ISO 日期/时间注释说话者和收件人信息（如果相关）成分和依赖句法信息状态（给定的、可访问的、新的、拆分的先行词）实体和共指注释，包括桥接照应实体链接（维基化）修辞结构理论中的话语解析和话语依赖

GUM is an open-source multilayered English corpus containing richly annotated texts across 12 text types. The annotations include: multiple part-of-speech (POS) tags, morphological features and lemmatization; sentence segmentation and coarse speech acts; document structure (paragraphs, headings, figures, etc.) encoded in TEI XML; ISO-compliant date/time annotations; speaker and addressee information (where applicable); constituent and dependency syntactic information; discourse status annotations (given, accessible, new, split antecedents); entity and coreference annotations including bridging anaphora and entity linking (wikification); and discourse parsing and discourse dependencies within Rhetorical Structure Theory (RST).

提供机构：

OpenDataLab

创建时间：

2022-08-16

搜集汇总

数据集介绍