five

TAC KBP Reference Knowledge Base

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2014T16
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>TAC KBP Reference Knowledge Base was developed by the Linguistic Data Consortium (LDC) in support of the NIST-sponsored TAC-KBP evaluation series. It is a knowledge base built from English Wikipedia articles and their associated infoboxes and covers over 800,000 entities. LDC also released&nbsp;TAC KBP Spanish Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2012-2014 (<a href="../../../LDC2016T26">LDC2016T26</a>.)</p><br> <p><a href="http://www.nist.gov/tac/">TAC</a> (Text Analysis Conference) is a series of workshops organized by <a href="http://www.nist.gov/">NIST</a> (the National Institute of Standards and Technology) to encourage research in natural language processing and related applications by providing a large test collection, common evaluation procedures, and a forum for researchers to share their results. TAC's KBP track (Knowledge Base Population) encourages the development of systems that can match entities mentioned in natural texts with those appearing in a knowledge base and extract novel information about entities from a document collection and add it to a new or existing knowledge base.</p><br> <p>Consult the LDC <a href="https://www.ldc.upenn.edu/collaborations/current-projects/tac-kbp">TAC-KBP</a> project page for further information about LDC's resource development for the TAC-KBP program.</p><br> <h3>Data</h3><br> <p>The source data (Wikipedia infoboxes and articles) was taken from an October 2008 snapshot of Wikipedia.</p><br> <p>TAC KBP Reference Knowledge Base contains a set of entities, each with a canonical name and title for the Wikipedia page, an entity type, an automatically parsed version of the data from the infobox in the entity's Wikipedia article, and a stripped version of the text of the Wiki article. Each entity is assigned one of four types: PER (person), ORG (organization), GPE (geo-political entity) and UKN (unknown).</p><br> <p>All data files are presented as UTF-8 encoded XML.</p><br> <h3>Samples</h3><br> <p>Please view the following <a href="desc/addenda/LDC2014T16.xml">sample</a>.</p><br> <h3>Updates</h3><br> <p>None at this time.</p></br> Portions © 2008-2009, 2014 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作