five

2006 CoNLL Shared Task - Ten Languages

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2015T11
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>2006 CoNLL Shared Task - Ten Languages consists of dependency treebanks in ten languages used as part of the CoNLL 2006 shared task on multi-lingual dependency parsing. The languages covered in this release are: Bulgarian, Danish, Dutch, German, Japanese, Portuguese, Slovene, Spanish, Swedish and Turkish.</p><br> <p>LDC also released the following 2006 &amp; 2007 CoNLL Shared Task corpora:</p><br> <ul><br> <li>2007 CoNLL Shared Task - Basque, Catalan, Czech &amp; Turkish (<a href="../../../LDC2018T06">LDC2018T06</a>)</li><br> <li>2007 CoNLL Shared Task - Greek, Hungarian &amp; Italian (<a href="../../../LDC2018T07">LDC2018T07</a>)</li><br> <li>2007 CoNLL Shared Task - Basque, Catalan, Czech &amp; Turkish (<a href="../../../LDC2018T06">LDC2018T06</a>)</li><br> <li>2006 CoNLL Shared Task - 2006 CoNLL Shared Task - Arabic &amp; Czech (<a href="../../../LDC2015T12">LDC2015T12</a>)</li><br> </ul><br> <p>&nbsp;</p><br> <p>This corpus is cross listed and jointly released with ELRA as&nbsp;<a href="http://catalog.elra.info/product_info.php?products_id=1250">ELRA-W0086</a>.</p><br> <p>The <a href="http://ifarm.nl/signll/conll/" rel="nofollow">Conference on Computational Natural Language Learning (CoNLL)</a> is accompanied every year by a shared task intended to promote natural language processing applications and evaluate them in a standard setting. In 2006, the shared task was devoted to the parsing of syntactic dependencies using corpora from up to thirteen languages. The task aimed to define and extend the then-current state of the art in dependency parsing, a technology that complemented previous tasks by producing a different kind of syntactic description of input text. More information about the 2006 shared task is available on the <a href="http://www.conll.org/previous-tasks">CoNLL-X web page</a>.</p><br> <p>LDC has released data sets from other CoNLL shared tasks. 2008 CoNLL Shared Task Data&nbsp;contains the English material used in the 2008 shared task which focused on English, employed a unified dependency-based formalism and merged the tasks of syntactic dependency parsing, identifying semantic arguments and labeling them with semantic roles. 2009 CoNLL Shared Task Data Parts 1 and 2 consists of the English, Catalan, Chinese, Czech, German and Spanish resources used in the 2009 task which included a comparison of time and space complexity based on participants' input and learning curve comparison for languages with large datasets.</p><br> <p>LDC has also released the following CoNLL Shared Task data sets:</p><br> <ul><br> <li>2006 CoNLL Shared Task - Arabic &amp; Czech (<a href="../../../LDC2015T12">LDC2015T12</a>)</li><br> <li>2008 CoNLL Shared Task Data (<a href="../../../LDC2009T12">LDC2009T12</a>)</li><br> <li>2009 CoNLL Shared Task Part 1 (<a href="../../../LDC2012T03">LDC2012T03</a>)</li><br> <li>2009 CoNLL Shared Task Part 2 (<a href="../../../LDC2012T04">LDC2012T04</a>)</li><br> <li>2015-2016 CoNLL Shared Task (<a href="../../../LDC2017T13">LDC2017T13</a>)</li><br> </ul><br> <h3>Data</h3><br> <p>The source data in the treebanks in this release consists principally of various texts (e.g., textbooks, news, literature) annotated in dependency format. In general, dependency grammar is based on the idea that the verb is the center of the clause structure and that other units in the sentence are connected to the verb as directed links or dependencies. This is a one-to-one correspondence: for every element in the sentence there is one node in the sentence structure that corresponds to that element. In constituency or phrase structure grammars, on the other hand, clauses are divided into noun phrases and verb phrases and in each sentence, one or more nodes may correspond to one element. The Penn Treebank (<a href="../../../LDC99T42">LDC99T42</a>) is an example of a constituency or phrase structure approach. All of the data sets in this release are dependency treebanks.</p><br> <p>The individual data sets are:</p><br> <ul><br> <li><a href="http://bultreebank.org/bg/">BulTreeBank</a>&nbsp;(Bulgarian)</li><br> <li><a href="http://mbkromann.github.io/copenhagen-dependency-treebank/">The Danish Dependency Treebank</a>&nbsp;(Danish)</li><br> <li><a href="http://odur.let.rug.nl/~vannoord/trees/">The Alpino Treebank </a>(Dutch)</li><br> <li><a href="http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.html">The TIGER Corpus</a> (German)</li><br> <li><a href="http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-js.html">Treebank Tuba-J/S</a> (Japanese)</li><br> <li><a href="http://www.linguateca.pt/floresta/info_floresta_English.html">Floresta Sinta(c)tica</a> (Portuguese)</li><br> <li><a href="http://nl.ijs.si/sdt/">Slovene Dependency Treebank, SDT V0.1</a> (Slovene)</li><br> <li><a href="http://www.dlsi.ua.es/projectes/3lb/index_en.html">Cast3LB</a> (Spanish)</li><br> <li><a href="http://stp.lingfil.uu.se/~nivre/research/Talbanken05.html">Talbanken05</a> (Swedish)</li><br> <li><a href="https://web.itu.edu.tr/gulsenc/treebanks.html">METU-Sabanci Turkish Treebank</a>&nbsp;(Turkish)</li><br> </ul><br> <h3>Samples</h3><br> <p>Please view these <a href="desc/addenda/LDC2015T11.jpn.txt">Japanese</a> and <a href="desc/addenda/LDC2015T11.bul.txt">Bulgarian</a> samples.</p><br> <h3>Updates</h3><br> <p>None at this time.</p></br> Portions © 2002-2005 Gosse Bouma, © 2002-2004 Mattias Buch-Kromann, © 2006 Eberhard-Karls Universitaet Tuebingen, Seminar fuer Sprachwissenschaft, Abt. Computerlinguistik, © 2006 Jan Einarsson, © 2002-2005 Geert Kloosterman, © 2002-2005 Robert Malouf, © 2006 Joakim Nivre, © 2006 Technical University of Catalonia, © 2006 Technical University of Valencia, © 2002-2004 The Department of International Language Studies and Computational Linguistics at the Copenhagen Business School, © 1998 The Society for Danish Language and Literature, © 2006 University of Alicante, © 2006 University of Barcelona, © 2002-2005 Univerity of Groningen, © 2002-2005 Leonoor van der Beek, © 2002-2005 Gertjan van Noord, © 2015 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作