five

ECB+ (extension to the EventCorefBank)

收藏
OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/ECB_plus
下载链接
链接失效反馈
官方服务:
资源简介:
ECB+ 语料库是 EventCorefBank 的扩展(ECB,Bejan 和 Harabagiu,2010)。一个新添加的语料库组件由 502 个文档组成,这些文档属于欧洲央行的 43 个主题,但描述的开创性事件与欧洲央行已经捕获的不同。所有语料库文本都是通过谷歌搜索找到的,并带有事件及其时间、地点、人类和非人类参与者的提及以及文档内和跨文档事件和实体的共同参考信息的注释。根据 ECB+ 注释指南,ECB 语料库的 2012 版注释 (Lee et al., 2012) 被用作重新注释 ECB 的起点。 与 2012 版欧洲央行注释的主要区别在于: (a) 五个事件组件在文本中进行了注释: 动作(以 ACTION 和 NEG 开头的注释标签) 时间(以 TIME 开头的注释标签) 位置(以 LOC 开头的注释标签) 人类参与者(以 HUMAN 开头的注释标签) 非人类参与者(以 NON_HUMAN 开头的注释标签) (b) 根据 ACE 注释指南 (LDC 2008)、TimeML (Pustejovsky et al., 2003 和 Sauri et al., 2003 和 Sauri et al.) 为五个主要事件组件中的每一个区分特定的动作类和实体子类型,从而产生 30 个注释标签的总标签集., 2005) (c) 建立了五个事件组成部分的提及之间的文件内和跨文件共指关系: INTRA_DOC_COREF 标记在不参与跨文档关系的文档共指链中捕获;通过 CAT 工具对文档内的共指进行了注释(Bartalesi et al., 2012) CROSS_DOC_COREF 标签表示在 CROMER 工具中创建的跨文档共指关系 (Girardi et al., 2014);所有共指分支都通过关系目标 ID 指向所谓的 TAG_DESCRIPTORS,指向人类友好的实例名称(由编码人员分配)以及 instance_id-s (d) 从“以事件为中心”的角度对事件进行注释,即根据提及在事件中所扮演的角色分配注释标签(有关更多信息,请参阅 ECB+ 参考资料)。

The ECB+ corpus is an extension of EventCorefBank (ECB; Bejan & Harabagiu, 2010). One newly added corpus component consists of 502 documents belonging to 43 topics of the ECB corpus, which describe events distinct from those captured by the existing ECB corpus. All corpus texts were retrieved via Google Search, and annotated with mentions of events, their temporal, spatial, human and non-human participants, as well as coreferential information for intra-document and cross-document events and entities. Per the ECB+ annotation guidelines, the 2012 edition of the ECB corpus annotations (Lee et al., 2012) was used as the starting point for re-annotating the ECB corpus. The key differences from the 2012 ECB annotations are as follows: (a) Five event components are annotated in the text: - Actions (annotation tags starting with ACTION and NEG) - Temporal information (annotation tags starting with TIME) - Spatial locations (annotation tags starting with LOC) - Human participants (annotation tags starting with HUMAN) - Non-human participants (annotation tags starting with NON_HUMAN) (b) Distinguished specific action classes and entity subtypes for each of the five primary event components in accordance with the ACE Annotation Guidelines (LDC 2008) and TimeML standards (Pustejovsky et al., 2003; Sauri et al., 2003, 2005), resulting in a total tag set of 30 annotation labels. (c) Established intra-document and cross-document coreferential relations between mentions of the five event components: - The INTRA_DOC_COREF tag captures intra-document coreference chains that do not participate in cross-document relations; intra-document coreference was annotated using the CAT tool (Bartalesi et al., 2012) - The CROSS_DOC_COREF tag denotes cross-document coreferential relations created via the CROMER tool (Girardi et al., 2014); all coreference branches point to so-called TAG_DESCRIPTORS via their relation target IDs, which link to human-readable instance names (assigned by annotators) and instance IDs (d) Annotated events from an "event-centric" perspective, i.e., annotation tags are assigned based on the role that a mention plays within an event (for more information, please refer to the ECB+ reference materials).
提供机构:
OpenDataLab
创建时间:
2022-06-23
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
ECB+是EventCorefBank的扩展语料库,包含502个文档,专注于事件及其组件的跨文档共指解析,采用30种标签详细标注动作、时间、位置等事件要素。数据集由阿姆斯特丹自由大学于2014年发布,适用于事件共指消解研究。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作