QReCC
收藏OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/QReCC
下载链接
链接失效反馈官方服务:
资源简介:
QReCC 包含 14K 对话和 81K 问答对。 QReCC 建立在 TREC CAsT、QuAC 和 Google Natural Questions 的问题之上。虽然 TREC CAsT 和 QuAC 数据集包含多轮对话,但 Natural Questions 不是对话数据集。 NQ 数据集中的问题被用作提示,以创建明确平衡上下文相关问题类型的对话,例如回指(共同引用)和省略号。对于每个查询,作者通过解析引用收集查询重写,生成的查询重写是原始(上下文相关)问题的上下文无关版本。然后将重写的查询用于搜索引擎来回答问题。每个查询还带有答案注释,链接到用于产生答案的网页。数据集中的每个对话都包含唯一的 Conversation_no、对话中唯一的 Turn_no、原始的 Question、Context、Rewrite 和 Answer with Answer_URL。资料来源:QReCC
QReCC contains 14,000 conversations and 81,000 question-answer pairs. It is built upon questions sourced from TREC CAsT, QuAC, and Google Natural Questions. While TREC CAsT and QuAC are multi-turn conversation datasets, Natural Questions is not a conversation dataset. Questions from the NQ dataset were used as prompts to create dialogues that explicitly balance context-dependent question types such as anaphora (coreference) and ellipsis. For each query, the authors collected query rewrites by resolving referential expressions, and the resulting query rewrites are context-free versions of the original, context-dependent questions. The rewritten queries were then used with search engines to answer the corresponding questions. Each query is also accompanied by answer annotations linked to the web pages used to generate the answers. Each conversation in the dataset includes a unique Conversation_no, a unique Turn_no within the conversation, the original Question, Context, Rewrite, and Answer with Answer_URL. Source: QReCC
提供机构:
OpenDataLab
创建时间:
2022-05-23
搜集汇总
数据集介绍

背景与挑战
背景概述
QReCC是一个开放域对话式问答数据集,包含14,000个对话和81,000个问答对,基于TREC CAsT、QuAC和Google Natural Questions构建。它专注于通过问题重写处理多轮对话中的上下文相关问题,如回指和省略号,每个对话提供原始问题、上下文、重写问题和答案链接,适用于自然语言处理和问答系统研究。
以上内容由遇见数据集搜集并总结生成



