five

TDT5 Topics and Annotations

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2006T19
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>TDT5 Topics and Annotations was developed by the Linguistic Data Consortium (LDC) and includes about 10,000 topic relevance judgments and other associated information for the TDT5 2004 evaluation topics.</p><br> <p>This release contains complete relevance judgments, including the results of adjudication, in which discrepancies between system submissions and LDC annotations are reviewed and relevance judgments updated. This release also contains answer keys for the link detection task.</p><br> <p>The TDT5 corpora were created by Linguistic Data Consortium with support from the DARPA TIDES (Translingual Information Detection, Extraction and Summarization) Program. The multilingual news text corresponding to this publication can be found in <a href="http://catalog.ldc.upenn.edu/LDC2006T18" rel="nofollow">TDT5 Multilingual News Text (LDC2006T18)</a>.</p><br> <h3>Data</h3><br> <p>A total of 250 topics, numbered 55001 - 55250, were annotated by LDC using a search guided annotation technique. Details of the annotation process are described in the annotation task definition.</p><br> <p>Approximately 25% of the topics are monolingual English (ENG), 25% are monolingual Mandarin Chinese (MAN), 25% are monolingual Arabic (ARB), and 25% are multilingual:</p><br> <table><br> <tbody><br> <tr><br> <td>63</td><br> <td>ENG</td><br> </tr><br> <tr><br> <td>62</td><br> <td>MAN</td><br> </tr><br> <tr><br> <td>62</td><br> <td>ARB</td><br> </tr><br> <tr><br> <td>35</td><br> <td>ARB ENG MAN</td><br> </tr><br> <tr><br> <td>21</td><br> <td>ENG MAN</td><br> </tr><br> <tr><br> <td>7</td><br> <td>ARB ENG</td><br> </tr><br> <tr><br> <td>250</td><br> <td>total</td><br> </tr><br> </tbody><br> </table><br> <p>Broken down by language and counting both mono- and multi-lingual topics:</p><br> <table><br> <tbody><br> <tr><br> <td>126</td><br> <td>ENG</td><br> </tr><br> <tr><br> <td>118</td><br> <td>MAN</td><br> </tr><br> <tr><br> <td>104</td><br> <td>ARB</td><br> </tr><br> </tbody><br> </table><br> <h3>Samples</h3><br> <p>For an example of the data in this corpus, please review this <a href="desc/addenda/LDC2006T19.txt">sample (TXT)</a> from the link detection files.</p><br> <h3>Updates</h3><br> <p>None at this time.</p></br> Portions © 2004, 2006 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作