five

English News Text Treebank: Penn Treebank Revised

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2015T13
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>English News Text Treebank: Penn Treebank Revised was developed by the Linguistic Data Consortium (LDC) with funding through a gift from Google Inc. It consists of a combination of automated and manual revisions of the <a href="../../../LDC99T42">Penn Treebank</a> annotation of Wall Street Journal (WSJ) stories. The data is comprised of 1,203,648 word-level tokens in 49,191 sentence-level tokens -- in all 2,312 of the original Penn Treebank WSJ files.</p><br> <h3>Data</h3><br> <p>This release includes revised tokenization, part-of-speech, and syntactic treebank annotation intended to bring the full WSJ treebank section into compliance with the agreed-upon policies and updates implemented for current English treebank annotation specifications at LDC. Examples include English Web Treebank (<a href="../../../LDC2012T13">LDC2012T13</a>), OntoNotes (<a href="../../../LDC2013T19">LDC2013T19</a>), and English translation treebanks such as English Translation Treebank: An-Nahar Newswire (<a href="../../../LDC2012T02">LDC2012T02</a>). English Treebank Supplemental Guidelines are included in this release.</p><br> <h3>Samples</h3><br> <p>Please view this <a href="desc/addenda/LDC2015T13.tree.txt">treebank</a> and <a href="desc/addenda/LDC2015T13.txt">tokenized</a> samples.</p><br> <h3>Updates</h3><br> <p>None at this time.</p></br> Portions © 1987-1989 Dow Jones & Company, Inc., © 1999, 2015 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作