2007 CoNLL Shared Task - Arabic & English
收藏catalogue.elra.info2017-12-21 更新2025-03-27 收录
下载链接:
https://catalogue.elra.info/en-us/repository/browse/ELRA-W0123/
下载链接
链接失效反馈官方服务:
资源简介:
2007 CoNLL Shared Task - Arabic & English consists of dependency treebanks in two languages used as part of the CoNLL 2007 shared task on multi-lingual dependency parsing and domain adaptation. The languages covered in this release are: Arabic and English.The Conference on Computational Natural Language Learning (CoNLL) is accompanied every year by a shared task intended to promote natural language processing applications and evaluate them in a standard setting. In 2006 and 2007, the shared task was devoted to the parsing of syntactic dependencies using corpora from up to thirteen languages. The task aimed to define and extend the then-current state of the art in dependency parsing, a technology that complemented previous tasks by producing a different kind of syntactic description of input text. The 2007 shared task added a domain adaptation track for English in addition to the multilingual track. More information about CoNLL and the 2007 shared task are available respectively at: http://www.signll.org/conll/ and http://www.conll.org/previous-tasks. The source data in the treebanks in this release consists principally of various texts (e.g., textbooks, news, literature) annotated in dependency format. In general, dependency grammar is based on the idea that the verb is the center of the clause structure and that other units in the sentence are connected to the verb as directed links or dependencies. This is a one-to-one correspondence: for every element in the sentence there is one node in the sentence structure that corresponds to that element. In constituency or phrase structure grammars, on the other hand, clauses are divided into noun phrases and verb phrases and in each sentence, one or more nodes may correspond to one element. All of the data sets in this release are dependency treebanks.The individual data sets are:Prague Arabic Dependency Treebank (Arabic)CHILDES (English)PennBioIE Oncology 1.0 (English)PennBioIE CYP 1.0 (English)Treebank-3 (English) Click here for licensing instructions: https://catalog.ldc.upenn.edu/LDC2017T21.
2007年CoNLL共享任务阿拉伯语与英语数据集包含两种语言的依存句法树库,被用作CoNLL 2007多语言依存句法解析与领域适应共享任务的组成部分。本发布版涵盖的语言包括:阿拉伯语和英语。计算自然语言学习会议(Conference on Computational Natural Language Learning,简称CoNLL)每年都会举办一项共享任务,旨在促进自然语言处理应用的发展,并在标准环境中对其进行评估。在2006年和2007年,共享任务专注于使用至多十三种语言的语料库进行句法依存关系的解析。该任务旨在定义并拓展当时依存句法领域的最新技术水平,这一技术通过提供一种不同于以往任务的文本句法描述方式,补充了之前的任务。2007年的共享任务在多语言轨道的基础上,增加了针对英语的领域适应轨道。有关CoNLL和2007年共享任务的更多信息,请分别查阅:http://www.signll.org/conll/ 和 http://www.conll.org/previous-tasks。本发布版中树库的源数据主要由各种文本(例如教科书、新闻、文学作品)构成,这些文本以依存格式进行标注。通常,依存语法基于动词是句子结构中心的概念,而句子中的其他单位则作为有向链接或依存关系与动词相连。这是一种一对一的对应关系:对于句子中的每一个元素,在句子结构中都有一个与之对应的节点。相反,在成分结构或短语结构语法中,句子被划分为名词短语和动词短语,而在每个句子中,一个或多个节点可能对应于一个元素。本发布版中的所有数据集均为依存句法树库。具体的数据集包括:布拉格阿拉伯语依存句法树库(阿拉伯语)、CHILDES(英语)、宾州生物信息学1.0(英语)、宾州生物信息学CYP 1.0(英语)、Treebank-3(英语)。有关许可说明,请点击此处:https://catalog.ldc.upenn.edu/LDC2017T21。
提供机构:
ELRA Catalogue of Language Resources



