QUASAR-S 和 QUASAR-T

Name: QUASAR-S 和 QUASAR-T
Creator: 卡内基梅隆大学计算机科学学院
Published: 2017-08-09 09:48:08
License: 暂无描述

arXiv2017-08-09 更新2024-06-21 收录

下载链接：

https://github.com/bdhingra/quasar

下载链接

链接失效反馈

官方服务：

资源简介：

QUASAR数据集包括QUASAR-S和QUASAR-T两个子集，旨在评估系统对自然语言查询的理解能力及从大量文本中提取答案的能力。QUASAR-S包含37000个填充式查询，源自Stack Overflow网站上的软件实体标签定义；QUASAR-T则包含43000个开放领域的琐事问题及其答案，数据来源于多个互联网资源。两个数据集均设计用于挑战事实问答系统的两个相关子任务：搜索包含正确答案的相关文本片段和阅读检索到的文本来回答查询。数据集的应用领域广泛，旨在解决从非结构化文本中提取正确答案的挑战，特别是在计算机编程和一般知识领域。

The QUASAR dataset comprises two subsets, QUASAR-S and QUASAR-T, which are designed to evaluate a system's ability to comprehend natural language queries and extract answers from large-scale text corpora. QUASAR-S contains 37,000 cloze-style queries derived from software entity tag definitions on the Stack Overflow website. In comparison, QUASAR-T includes 43,000 open-domain trivia questions and their corresponding answers, with data sourced from multiple Internet resources. Both datasets are developed to challenge two core subtasks of factual question answering systems: retrieving relevant text snippets that contain correct answers, and reading the retrieved texts to generate answers for the given queries. The datasets have wide-ranging application prospects, aiming to address the challenge of extracting accurate answers from unstructured text, particularly in the domains of computer programming and general knowledge.

提供机构：

卡内基梅隆大学计算机科学学院

创建时间：

2017-07-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集