five

BLLIP 1987-89 WSJ Corpus Release 1

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2000T43
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>Brown Laboratory for Linguistic Information Processing (BLLIP)1987-89 WSJ Corpus Release 1 contains a complete, Treebank-style part-of-speech (POS) tagged and parsed version of the three-year Wall Street Journal (WSJ) collection from ACL/DCI (<a href="../../../LDC93T1">LDC93T1</a>), approximately 30 million words. The annotation was performed using statistically-based methods developed by BLIIP researchers Eugene Charniak, Don Blaheta, Niyu Ge, Keith Hall, John Hale and Mark Johnson.</p><br> <p>This corpus both overlaps and supplements the million-word Penn Treebank (PTB) collection of parsed and POS-tagged WSJ texts.</p><br> <h3>Data</h3><br> <p>The PTB project selected 2,499 stories from a three-year WSJ collection of 98,732 stories for syntactic annotation. These 2,499 stories are distributed in Treebank-2 (<a href="../../../LDC95T7">LDC95T7</a>) and Treebank-3 (<a href="../../../LDC99T42">LDC99T42</a>), both of which include the raw text for each story.</p><br> <h3>Updates</h3><br> <p>There are no updates at this time.</p></br> Portions © 1987-1989 Dow Jones & Company, Inc., © 2000 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作