five

BOLT Information Retrieval Comprehensive Training and Evaluation

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2018T18
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>BOLT Information Retrieval Comprehensive Training and Evaluation was developed by the Linguistic Data Consortium (LDC) and consists of all data produced in support of the Information Retrieval (<a href="https://www.ldc.upenn.edu/collaborations/current-projects/bolt/information-retrieval">IR</a>) task within the DARPA Broad Operational Language Translation (BOLT) Program, including annotations, source documents and scoring software.</p><br> <p>The <a href="https://www.ldc.upenn.edu/collaborations/current-projects/bolt">BOLT</a> program developed machine translation and information retrieval for less formal genres, focusing particularly on user-generated content. LDC supported BOLT by collecting informal data sources -- discussion forums, text messaging and chat -- in Chinese, Egyptian Arabic and English. The collected data was translated and annotated for various tasks including word alignment, treebanking, propbanking and co-reference.</p><br> <p>The material in this release relates to the IR task, which sought to support development of systems that could: (1) take as input a natural language English query sentence; (2) return relevant responses to that query from a large corpus of informal documents in the three BOLT languages (Arabic, Chinese, and English); and (3) translate responses from non-English documents into English.</p><br> <h3>Data</h3><br> <p>BOLT Information Retrieval Comprehensive Training and Evaluation contains the pilot, dry run, and evaluation data developed for each phase of the BOLT IR task, including: (1) natural-language IR queries, system responses to queries, and manually-generated assessment judgments for system responses; (2) discussion forum source documents in Arabic, Chinese and English; (3) scoring software for each evaluation phase; and (4) experimental data developed in Phase 2.</p><br> <p>Source data is presented as a series of zip archives containing xml files. Queries and responses data are presented as XML as well. Judgments are included as tab delimited files.</p><br> <h3>Acknowledgement</h3><br> <p>This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-11-C-0145. The content does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.</p><br> <h3>Samples</h3><br> <p>Please view the following samples:</p><br> <ul><br> <li><a href="desc/addenda/LDC2018T18.src.xml">Source Data</a></li><br> <li><a href="desc/addenda/LDC2018T18.que.xml">Query</a></li><br> <li><a href="desc/addenda/LDC2018T18.asses.xml">Assessment</a></li><br> <li><a href="desc/addenda/LDC2018T18.r-asses.xml">Response Assessment</a></li><br> </ul><br> <h3>Updates</h3><br> <p>None at this time.</p></br> Portions © 2012-2016, 2018 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作