five

KORC

收藏
arXiv2023-07-07 更新2024-06-21 收录
下载链接:
https://github.com/THU-KEG/KoRC
下载链接
链接失效反馈
官方服务:
资源简介:
KORC是由清华大学人工智能研究院开发的知识导向阅读理解基准,旨在深度理解文本。该数据集包含31,804个问题,覆盖广泛的知识领域,通过大规模知识库指导问题构建。KORC的答案格式灵活,不限于文本中的选择或范围,而是使用知识库中的标签。数据集通过三种不同的注释方法构建:基于模板的生成、人工注释和大型语言模型注释。KORC的应用领域包括机器阅读理解,特别是需要结合文本信息和背景知识的场景,旨在解决深度文本理解中的挑战。

KORC is a knowledge-oriented reading comprehension benchmark developed by the Institute of Artificial Intelligence, Tsinghua University, designed to facilitate deep text understanding. This dataset contains 31,804 questions spanning a wide range of knowledge domains, with its question formulation guided by large-scale knowledge bases. The answer formats of KORC are highly flexible, not restricted to multiple-choice options or text spans extracted from the source text, but instead employ labels from the associated knowledge bases. The dataset is constructed through three distinct annotation pipelines: template-based generation, manual annotation, and large language model-powered annotation. Application scenarios of KORC include machine reading comprehension, particularly scenarios that require integrating contextual textual information and external background knowledge, aiming to address the core challenges in deep text comprehension.
提供机构:
清华大学人工智能研究院
创建时间:
2023-07-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作