five

News Sub-domain Named Entity Recognition

收藏
DataCite Commons2025-05-06 更新2025-05-17 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2023T12
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3> <p>News Sub-domain Named Entity Recognition (LDC2023T12) was developed at the University of Pennsylvania and contains over 20,000 English news sentences annotated with named entities and categorized into sub-domains. The sentences were extracted from&nbsp;<a href="../../../LDC2008T19">The New York Times Annotated Corpus (LDC2008T19)</a>, which is comprised of over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007 with article metadata provided by the New York Times Newsroom, the New York Times Indexing Service and the online production staff at nytimes.com.</p> <h3>Data</h3> <p>Sentences were selected from different years and topics following the metadata provided in the New York Times corpus above. Named entity annotation was based on the&nbsp;<a href="https://paperswithcode.com/dataset/conll-2003">CoNLL-2003 guidelines and annotation scheme</a>. Sentences were labeled with person (PER), location (LOC) and organization (ORG) tags using phrase matching with a manual second pass. Sub-domains are: Arts (+Weekend/Cultural), Business (+Financial), Classifieds (+Obituary), Editorial, Foreign, Metropolitan, Sports and Others. "Others" includes topics such as Real Estate, New Jersey Weekly, Book Review, Job Market, Science, and Health &amp; Fitness.</p> <p>Each line in the annotation files (except the document id) contains two columns separated by tabs: the first column contains the word, and the second column contains the tag. Following CoNLL guidelines, tags are B-TYPE, I-TYPE and O. TYPE can be PER, LOC or ORG.</p> <p>Annotation and source text files are presented in txt format.</p> <h3>Samples</h3> <p><a href="desc/addenda/LDC2023T12.txt">TXT file</a></p> <h3>Updates</h3> <p>None at this time.</p>
提供机构:
Linguistic Data Consortium
创建时间:
2023-11-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作