five

Coordination Annotation for the Penn Treebank

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2015T08
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>Coordination Annotation for the Penn Treebank&nbsp;is a stand-off annotation for the Wall Street Journal portion of Treebank-3 (PTB3) (<a href="../../../LDC99T42">LDC99T42</a>) developed by researchers at the <a href="http://www.uni-duesseldorf.de/home/en/home.html">University of D&uuml;sseldorf</a> and <a href="http://www.iub.edu/">Indiana University</a>. It marks all tokens that have a coordinating function (potentially among other functions).</p><br> <p>Coordination is a syntactic structure that links together two or more elements known as conjuncts or conjoins. The presence of coordination is often signaled by the appearance of a coordinator (coordinating conjunction), such as <em>and</em>, <em>or</em>, <em>but</em> in English.</p><br> <p>Penn Coordination Annotation is available at no cost to all licensees of <a href="../../../LDC99T42">PTB3</a> and appears in their download queue associated with LDC99T42 as <em>penn_coordination_anno_LDC2015T08.tgz</em>.</p><br> <h3>Data</h3><br> <p>This annotation is presented in a single UTF-8 plain text tsv file with columns as follows:</p><br> <ul><br> <li>section: Penn Treebank WSJ section number</li><br> <li>file: Number of file within section</li><br> <li>sentence: Number of sentence (starting with 0)</li><br> <li>token: Number of token (starting with 0)</li><br> <li>annotation: "P" if the token is a coordinating punctuation, "O" otherwise</li><br> </ul><br> <h3>Samples</h3><br> <p>Please view this <a href="desc/addenda/LDC2015T08.txt">sample</a>.</p><br> <h3>Updates</h3><br> <p>None at this time.</p></br> Portions © 2015 Sandra Kübler, Wolfgang Maier, Trustees of the University of Pennsylvania

<h3>简介</h3><br> <p>宾夕法尼亚树库(Penn Treebank, PTB)并列结构标注是针对树库3(Treebank-3, PTB3)中《华尔街日报》语料部分的分离式标注(stand-off annotation),由<a href="http://www.uni-duesseldorf.de/home/en/home.html">杜塞尔多夫大学</a>与<a href="http://www.iub.edu/">印第安纳大学</a>的研究人员为<a href="../../../LDC99T42">LDC99T42</a>语料库开发。该标注会标记所有承担并列功能(也可同时兼具其他功能)的Token。</p><br> <p>并列结构是一种句法结构,用于连接两个及以上被称为并列项(conjuncts/conjoins)的成分。在英语中,并列结构通常由并列连词(coordinator,即coordinating conjunction)标识,例如<em>and</em>、<em>or</em>、<em>but</em>。</p><br> <p>宾夕法尼亚并列结构标注集对所有获得树库3(PTB3)授权的使用者免费开放,其以<em>penn_coordination_anno_LDC2015T08.tgz</em>的文件名出现在与<a href="../../../LDC99T42">LDC99T42</a>绑定的下载队列中。</p><br> <h3>数据</h3><br> <p>该标注集以单个UTF-8编码的纯文本TSV文件格式提供,各列含义如下:</p><br> <ul><br> <li>section:宾夕法尼亚树库《华尔街日报》语料的章节编号</li><br> <li>file:章节内的文件编号</li><br> <li>sentence:句子编号(从0开始计数)</li><br> <li>token:Token编号(从0开始计数)</li><br> <li>annotation:若该Token为并列标点则标注为"P",否则标注为"O"</li><br> </ul><br> <h3>示例</h3><br> <p>请查看该<a href="desc/addenda/LDC2015T08.txt">示例文件</a>。</p><br> <h3>更新记录</h3><br> <p>暂无更新。</p><br> 部分内容 © 2015 Sandra Kübler、Wolfgang Maier、宾夕法尼亚大学校董会
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作