KORC
收藏arXiv2023-07-07 更新2024-06-21 收录
下载链接:
https://github.com/THU-KEG/KoRC
下载链接
链接失效反馈官方服务:
资源简介:
KORC是由清华大学人工智能研究院开发的知识导向阅读理解基准,旨在深度理解文本。该数据集包含31,804个问题,覆盖广泛的知识领域,通过大规模知识库指导问题构建。KORC的答案格式灵活,不限于文本中的选择或范围,而是使用知识库中的标签。数据集通过三种不同的注释方法构建:基于模板的生成、人工注释和大型语言模型注释。KORC的应用领域包括机器阅读理解,特别是需要结合文本信息和背景知识的场景,旨在解决深度文本理解中的挑战。
KORC is a knowledge-oriented reading comprehension benchmark developed by the Institute of Artificial Intelligence, Tsinghua University, designed to facilitate deep text understanding. This dataset contains 31,804 questions spanning a wide range of knowledge domains, with its question formulation guided by large-scale knowledge bases. The answer formats of KORC are highly flexible, not restricted to multiple-choice options or text spans extracted from the source text, but instead employ labels from the associated knowledge bases. The dataset is constructed through three distinct annotation pipelines: template-based generation, manual annotation, and large language model-powered annotation. Application scenarios of KORC include machine reading comprehension, particularly scenarios that require integrating contextual textual information and external background knowledge, aiming to address the core challenges in deep text comprehension.
提供机构:
清华大学人工智能研究院
创建时间:
2023-07-07



