five

Annotated corpora and tools of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions (edition 1.1)

收藏
hdl.handle.net2025-01-15 收录
下载链接:
http://hdl.handle.net/11372/LRT-2842
下载链接
链接失效反馈
官方服务:
资源简介:
This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). VMWEs were annotated according to the universal guidelines in 19 languages. The corpora are provided in the cupt format, inspired by the CONLL-U format. The corpora were used in the 1.1 edition of the PARSEME Shared Task (2018). For most languages, morphological and syntactic information ­­­­– not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training, development and test data, as well as the evaluation tools used in the PARSEME Shared Task 1.1 (2018). The annotation guidelines are available online: http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.1

本多语言资源包含了经过人工标注的口头多词表达语料库。其中,多词表达语料库(VMWEs)涵盖了成语(如:let the cat out of the bag)、轻动词结构(如:make a decision)、动词-介词结构(如:give up)、固有反身动词(如:help oneself)以及多动词结构(如:make do)。这些VMWEs的标注遵循了19种语言的通用指南。语料库以cupt格式提供,该格式灵感来源于CONLL-U格式。这些语料库被用于2018年PARSEME共享任务(1.1版)中。 对于大多数语言,除了使用UD标签集之外,还提供了形态学和句法信息,包括词性、词元、形态学特征以及/或句法依存关系。根据语言的不同,这些信息来源于语料库(例如:通用依存关系)或基于语料库训练的自动解析器(例如:UDPipe)。 本项包含训练数据、开发数据和测试数据,以及用于PARSEME共享任务1.1版(2018年)的评估工具。 标注指南可在以下网址获取:http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.1
提供机构:
hdl.handle.net
二维码
社区交流群
二维码
科研交流群
商业服务