Chinese Reading Comprehension Datasets
收藏数据集概述
数据集内容
| 部分 | 描述 |
|---|---|
| Chinese Reading Comprehension Datasets | 描述公开的中文阅读理解数据集 |
| State-of-the-art Systems | 列出针对这些数据集的顶尖系统(已发表/未发表) |
| Chinese Reading Comprehension Evaluations and Competitions | 介绍中文阅读理解竞赛 |
Chinese Reading Comprehension Datasets
| 数据集 | 类型 | 查询类型 | 答案类型 | 文档数量 | 查询数量 | 下载链接 |
|---|---|---|---|---|---|---|
| People Daily & Childrens Fairy Tale [1] | 新闻 & 童话 | 填空 | 单词 | 28K | 100K | link |
| WebQA [2] | 网络 | 用户日志 | 实体 | - | 42K | link |
| CMRC 2017 [3] | 新闻 | 填空 & 查询 | 单词 | - | 364K | link |
| DuReader [4] | 网络 | 用户日志 | 自由形式 | 1M | 200K | link |
| CMRC 2018 [5] | 维基 | 查询 | 跨度 | - | 18K | link |
| DRCD [6]<sup>(繁体中文)</sup> | 维基 | 查询 | 跨度 | - | 34K | link |
| C^3 [7] | 混合 | 查询 | 选择 | 14K | 24K | link |
| CMRC 2019 [8] | 故事 | 填空 | 句子 | 1K | 100K | link |
| ChID [9] | 多变 | 填空 | 成语 | 580K | 729K | link |
State-of-the-art Systems
People Daily & Childrens Fairy Tale
| 系统 | PD-DEV | PD-TEST | CFT-TEST-AUTO | CFT-TEST-HUMAN | 备注 |
|---|---|---|---|---|---|
| SAW Reader (Zhang et al., 2018) | 72.8 | 75.1 | - | 43.8 | - |
| CAW Reader (Zhang et al., 2018) | 69.4 | 70.5 | - | 39.7 | - |
| CAS Reader (Cui et al., 2016) | 65.2 | 68.1 | 41.3 | 35.0 | - |
| AS Reader (Cui et al., 2016) | 64.1 | 67.2 | 40.9 | 33.1 | - |
CMRC 2017
Cloze Track
| 系统 | DEV | TEST | 备注 |
|---|---|---|---|
| 6ESTATES PTE LTD (ensemble) | 81.85 | 81.90 | - |
| SJTU BCMI-NLP (ensemble) | 78.35 | 80.67 | - |
| YunSiChuangZhi (ensemble) | 79.20 | 80.27 | - |
| SAW Reader (Zhang et al., 2018) | 78.95 | 78.80 | - |
| CAW Reader (Zhang et al., 2018) | 77.95 | 78.50 | - |
| Word + Char + BPE-FRQ (Zhang et al., 2018) | 79.05 | 78.83 | - |
User Query Track
| 系统 | DEV | TEST | 备注 |
|---|---|---|---|
| ECNU (ensemble) | 90.45 | 69.53 | - |
| SXU-3 (single model) | 47.80 | 49.07 | - |
| ZZU (single model) | 31.10 | 32.53 | - |
DuReader
| 系统 | ROUGE-L | BLEU-4 | 备注 |
|---|---|---|---|
| AliReader | 63.48 | 61.54 | - |
| NI-Reader (ensemble) | 63.38 | 59.23 | - |
| mrc_try_mingyan (single model) | 62.20 | 59.72 | - |
| Yan et al., 2018 | 50.71 | 49.39 | - |
| Li et al., 2018 | 44.95 | 42.68 | - |
| Wang et al., 2018 | 44.18 | 40.97 | - |
| Xu et al., 2018 | 39.60 | 34.76 | - |
| Match-LSTM (He et al., 2018) | 39.2 | 31.9 | - |
| BiDAF (He et al., 2018) | 39.0 | 31.8 | - |
CMRC 2018
| 系统 | DEV-EM | DEV-F1 | TEST-EM | TEST-F1 | CHALLENGE-EM | CHALLENGE-F1 | 备注 |
|---|---|---|---|---|---|---|---|
| P-Reader (single model) | 59.894 | 81.499 | 65.189 | 84.386 | 15.079 | 39.583 | - |
| GM-Reader (ensemble) | 58.931 | 80.069 | 64.045 | 83.046 | 15.675 | 37.315 | - |
| MCA-Reader (ensemble) | 66.698 | 85.538 | 71.175 | 88.090 | 15.476 | 37.104 | - |
| Z-Reader (single model) | 79.776 | 92.696 | 74.178 | 88.145 | 13.889 | 37.422 | - |
| SRC->DS(±) (Yang et al., 2019) | 49.2 | 65.4 | - | - | - | - | - |
DRCD
| 系统 | DEV-EM | DEV-F1 | TEST-EM | TEST-EM | 备注 |
|---|---|---|---|---|---|
| SRC + DS(±) (Yang et al., 2019) | 55.4 | 67.7 | - | - | - |
| r-net (single model) | - | - | 29.1 | 44.4 | - |
C^3
| 系统 | DEV-1A | TEST-1A | DEV-1B | TEST-1B | DEV-2A | TEST-2A | DEV-2B | TEST-2B | 备注 |
|---|---|---|---|---|---|---|---|---|---|
| BERT_CN (Sun et al., 2019) | 63.0 | 62.6 | 62.3 | 62.1 | 36.7 | 26.2 | 34.7 | 31.3 | - |
Chinese Reading Comprehension Evaluations and Competitions
-
The First Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2017)
主办:CIPS-CL, Joint Laboratory of HIT and iFLYTEK Research (HFL), iFLYTEK Co. Ltd
竞赛类型:填空式RC, 用户查询RC -
The Second Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2018)
主办:CIPS-CL, Joint Laboratory of HIT and iFLYTEK Research (HFL), iFLYTEK Co. Ltd
竞赛类型:跨度提取RC -
2018 NLP Challenge on Machine Reading Comprehension
主办:CCF, CIPSC, Baidu Inc.
竞赛类型:开放领域RC -
CIPS-SOGOU QA Competition
主办:CIPSC, SOGOU
竞赛类型:事实QA, 非事实QA -
The Third Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2019)
主办:CIPS-CL, Joint Laboratory of HIT and iFLYTEK Research (HFL), iFLYTEK Co. Ltd
竞赛类型:句子填空 -
2019 NLP Language and Intelligence Challenge
主办:CCF, CIPSC, Baidu Inc.
竞赛类型:开放领域RC -
Chinese Idiom Understanding Contest
主办:CCF, Tsinghua University
竞赛类型:填空测试




