five

BBN Pronoun Coreference and Entity Type Corpus

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2005T33
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3> <p>This file contains documentation on the BBN Pronoun Coreference and Entity Type Corpus, Linguistic Data Consortium (LDC) catalog number LDC2005T33 and ISBN 1-58563-362-3. </p><p> This publication supplements the one million word Penn Treebank corpus of Wall Street Journal texts (LDC95T7). The corpus contains stand-off annotation of pronoun coreference, indicated by sentence and token numbers, as well as annotation of a variety of entity and numeric types. All annotation was done by hand at BBN using proprietary annotation tools. This corpus was developed by BBN to support the ACE and AQUAINT programs </p> <p> The corpus contains two components: </p><ul> <li> <p>Pronoun coreference. Stand-off annotation of pronoun coreference of the WSJ corpus is provided in a single file. Pronouns and antecedents are indexed by sentence and token numbers.</p> </li> <li> <p>Entity types. The corpus includes annotation of 12 named entity types (Person, Facility, Organization, GPE, Location, Nationality, Product, Event, Work of Art, Law, Language, and Contact-Info), nine nominal entity types (Person, Facility, Organization, GPE, Product, Plant, Animal, Substance, Disease and Game), and seven numeric types (Date, Time, Percent, Money, Quantity, Ordinal and Cardinal). Several of these types are further divided into subtypes. Annotation for a total of 64 subtypes is provided. </p> </li> </ul><h3>Samples</h3> <p>For an example of the data in this corpus, please examing the following samples:</p> <ul> <li> <a href="desc/addenda/LDC2005T33.qa" rel="nofollow">LDC2005T33.qa</a> </li> <li> <a href="desc/addenda/LDC2005T33_pron.txt" rel="nofollow">LDC2005T33_pron.txt</a> </li> <li> <a href="desc/addenda/LDC2005T33_sent.txt" rel="nofollow">LDC2005T33_sent.txt</a> </li> </ul> </br> Portions © 1989 Wall Street Journal, © 2002 BBNT Solutions LLC., © 2005 Trustees of the University of Pennslyvania.
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作