ReCO
收藏arXiv2020-06-22 更新2024-06-21 收录
下载链接:
https://github.com/benywon/ReCO
下载链接
链接失效反馈官方服务:
资源简介:
ReCO是一个大规模的人工策划的中文阅读理解数据集,专注于观点问题。该数据集包含30万个问题,是目前中文阅读理解领域最大的数据集之一。数据集的问题来源于真实的搜索引擎查询,通过众包工作者从检索到的文档中提取支持片段,并给出抽象的肯定/否定/不确定答案。ReCO的显著特点是除了提供原始上下文段落外,还提供了可以直接用于回答问题的支持证据。数据集的创建过程包括查询选择、文档检索、文档过滤、证据提取和答案生成等步骤,确保数据的高质量和难度。ReCO的应用领域广泛,旨在通过深入的文本推理技能,如因果推理和逻辑推理,解决机器阅读理解中的挑战。
ReCO is a large-scale manually curated Chinese reading comprehension dataset focused on opinion-based questions. This dataset contains 300,000 questions, making it one of the largest datasets in the current Chinese reading comprehension field. The questions in the dataset originate from real search engine queries, and crowdworkers extract supporting passages from the retrieved documents while providing abstract affirmative, negative, or uncertain answers. A notable feature of ReCO is that in addition to supplying the original contextual paragraphs, it also provides supporting evidence that can be directly utilized to answer the given questions. The dataset's creation process encompasses steps including query selection, document retrieval, document filtering, evidence extraction, and answer generation, which guarantees the high quality and proper difficulty of the data. ReCO has a wide range of application scenarios, aiming to address the challenges in machine reading comprehension through in-depth textual reasoning skills such as causal reasoning and logical reasoning.
提供机构:
搜狗公司
创建时间:
2020-06-22



