BOLT Information Retrieval Comprehensive Training and Evaluation

Name: BOLT Information Retrieval Comprehensive Training and Evaluation
Creator: Linguistic Data Consortium
Published: 2021-07-01 16:32:07
License: 暂无描述

DataCite Commons2021-07-01 更新2025-04-16 收录

下载链接：

https://catalog.ldc.upenn.edu/LDC2018T18

下载链接

链接失效反馈

官方服务：

资源简介：

<h3>Introduction</h3><br> <p>BOLT Information Retrieval Comprehensive Training and Evaluation was developed by the Linguistic Data Consortium (LDC) and consists of all data produced in support of the Information Retrieval (<a href="https://www.ldc.upenn.edu/collaborations/current-projects/bolt/information-retrieval">IR</a>) task within the DARPA Broad Operational Language Translation (BOLT) Program, including annotations, source documents and scoring software.</p><br> <p>The <a href="https://www.ldc.upenn.edu/collaborations/current-projects/bolt">BOLT</a> program developed machine translation and information retrieval for less formal genres, focusing particularly on user-generated content. LDC supported BOLT by collecting informal data sources -- discussion forums, text messaging and chat -- in Chinese, Egyptian Arabic and English. The collected data was translated and annotated for various tasks including word alignment, treebanking, propbanking and co-reference.</p><br> <p>The material in this release relates to the IR task, which sought to support development of systems that could: (1) take as input a natural language English query sentence; (2) return relevant responses to that query from a large corpus of informal documents in the three BOLT languages (Arabic, Chinese, and English); and (3) translate responses from non-English documents into English.</p><br> <h3>Data</h3><br> <p>BOLT Information Retrieval Comprehensive Training and Evaluation contains the pilot, dry run, and evaluation data developed for each phase of the BOLT IR task, including: (1) natural-language IR queries, system responses to queries, and manually-generated assessment judgments for system responses; (2) discussion forum source documents in Arabic, Chinese and English; (3) scoring software for each evaluation phase; and (4) experimental data developed in Phase 2.</p><br> <p>Source data is presented as a series of zip archives containing xml files. Queries and responses data are presented as XML as well. Judgments are included as tab delimited files.</p><br> <h3>Acknowledgement</h3><br> <p>This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-11-C-0145. The content does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.</p><br> <h3>Samples</h3><br> <p>Please view the following samples:</p><br> <ul><br> <li><a href="desc/addenda/LDC2018T18.src.xml">Source Data</a></li><br> <li><a href="desc/addenda/LDC2018T18.que.xml">Query</a></li><br> <li><a href="desc/addenda/LDC2018T18.asses.xml">Assessment</a></li><br> <li><a href="desc/addenda/LDC2018T18.r-asses.xml">Response Assessment</a></li><br> </ul><br> <h3>Updates</h3><br> <p>None at this time.</p></br> Portions © 2012-2016, 2018 Trustees of the University of Pennsylvania

提供机构：

Linguistic Data Consortium

创建时间：

2020-11-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集