five

Native Chinese Reader (NCR) 中文中学阅读理解数据集

收藏
千言数据集2024-05-15 收录
下载链接:
https://www.luge.ai/#/luge/dataDetail?id=57
下载链接
链接失效反馈
官方服务:
资源简介:
NCR是一个中文阅读理解数据集,数据集中的题目为中学语文阅读理解题目。通常是一个长文章后接着几个相应的问题。文章的平均长度为1024,这十分考验模型从长文本中提炼信息的能力。相应的问题涉及字词的理解,段落总结,逻辑推理,情感分析以及创作背景等各个方面。人工评测的结果表明,这些问题对于中国本土人都有相当的难度,并且当前AI模型和人工之间的差距还很大。数据来源于网络上公开的中高中语文阅读理解题,经过人工的筛查和分类,数据集中总共有8388个文章和20477个问题。

NCR is a Chinese reading comprehension dataset composed of middle and high school Chinese reading comprehension questions. Typically, each sample includes a long article followed by several corresponding questions. The average length of the articles is 1024, which imposes high requirements on models' capacity to extract information from long texts. The questions cover various aspects such as word and phrase understanding, paragraph summarization, logical reasoning, sentiment analysis, and creative background analysis. Manual evaluation results indicate that these questions are considerably challenging even for native Chinese speakers, and there remains a substantial gap between current AI models and human performance. All data is sourced from publicly available middle and high school Chinese reading comprehension questions online. After manual screening and classification, the dataset contains a total of 8388 articles and 20477 questions.
提供机构:
清华大学交叉信息研究院 纽约大学 深圳大学 北京大学 清华大学
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务