Prague Czech-English Dependency Treebank 2.0
收藏DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2012T08
下载链接
链接失效反馈官方服务:
资源简介:
<h3>Introduction</h3><p>Prague Czech-English Dependency Treebank (PCEDT) 2.0 was developed by the <a href="http://ufal.mff.cuni.cz/" rel="nofollow"> Institute of Formal and Applied Linguistics</a> at <a href="http://www.cuni.cz/" rel="nofollow">Charles University</a> in Prague, Czech Republic. It is a corpus of Czech-English parallel resources translated, aligned and manually annotated for dependency structure, semantic labeling, argument structure, ellipsis and anaphora resolution. This release updates Prague Czech-English Dependency Treebank 1.0 (<a href="http://catalog.ldc.upenn.edu/LDC2004T25" rel="nofollow">LDC2004T25</a>) by adding English newswire texts so that it now contains over two million words in close to 100,000 sentences. </p><h3>Data</h3><p> The principal new material in PCEDT 2.0 is the inclusion of the entire Wall Street Journal data from Treebank-3 (<a href="http://catalog.ldc.upenn.edu/LDC99T42" rel="nofollow">LDC99T42</a>). Not included from PCEDT 1.0 are the Readers Digest material, the Czech monolingual corpus, and the English-Czech dictionary. </p><p>Each section is enhanced with a comprehensive manual linguistic annotation in the Prague Dependency Treebank style (<a href="http://catalog.ldc.upenn.edu/LDC2006T01" rel="nofollow">LDC2006T01</a>, Prague Dependency Treebank 2.0). The main features of this annotation style are:</p><ul> <li>dependency structure of the content words and coordinating and similar structures (function words are attached as their attribute values)</li> <li>semantic labeling of content words and types of coordinating structures</li> <li>argument structure, including an argument structure (valency) lexicon for both languages</li> <li>ellipsis and anaphora resolution</li> </ul><p>This annotation style is called <strong>tectogrammatical annotation,</strong> and it constitutes the <strong>tectogrammatical layer</strong> in the corpus.</p><p>Please consult the PCEDT <a href="http://ufal.mff.cuni.cz/pcedt2.0/" rel="nofollow">website</a> for more information and documentation.</p><h3>Samples</h3><p>Please follow this <a href="./desc/addenda/LDC2012T08.jpg" rel="nofollow">link</a> for a sample of the data included.</p><h3>Updates</h3><p> None at this time. </p></br>
Portions © 1987-1989 Dow Jones & Company, Inc., © 2002-2012 Charles University in Prague, Institute of Formal and Applied Linguistics, © 1999, 2004, 2012 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30



