ceval-exam
收藏魔搭社区2026-05-17 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/opencompass/ceval-exam
下载链接
链接失效反馈官方服务:
资源简介:
C-Eval is a comprehensive Chinese evaluation suite for foundation models. It consists of 13948 multi-choice questions spanning 52 diverse disciplines and four difficulty levels. Please visit our [website](https://cevalbenchmark.com/) and [GitHub](https://github.com/SJTU-LIT/ceval/tree/main) or check our [paper](https://arxiv.org/abs/2305.08322) for more details.
Each subject consists of three splits: dev, val, and test. The dev set per subject consists of five exemplars with explanations for few-shot evaluation. The val set is intended to be used for hyperparameter tuning. And the test set is for model evaluation. Labels on the test split are not released, users are required to submit their results to automatically obtain test accuracy. [How to submit?](https://github.com/SJTU-LIT/ceval/tree/main#how-to-submit)
### Load the data
```python
from modelscope.msdatasets import MsDataset
ds = MsDataset.load('opencompass/ceval-exam', subset_name="computer_network")
print(dataset['val'][0])
# {'id': 0, 'question': '使用位填充方法,以01111110为位首flag,数据为011011111111111111110010,求问传送时要添加几个0____', 'A': '1', 'B': '2', 'C': '3', 'D': '4', 'answer': 'C', 'explanation': ''}
```
More details on loading and using the data are at our [github page](https://github.com/SJTU-LIT/ceval#data).
Please cite our paper if you use our dataset.
```
@article{huang2023ceval,
title={C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models},
author={Huang, Yuzhen and Bai, Yuzhuo and Zhu, Zhihao and Zhang, Junlei and Zhang, Jinghan and Su, Tangjun and Liu, Junteng and Lv, Chuancheng and Zhang, Yikai and Lei, Jiayi and Fu, Yao and Sun, Maosong and He, Junxian},
journal={arXiv preprint arXiv:2305.08322},
year={2023}
}
```
C-Eval是一款面向基础模型的综合性中文评测套件。它包含13948道多项选择题,涵盖52个多元学科与4个难度层级。请访问我们的[官网](https://cevalbenchmark.com/)、[GitHub仓库](https://github.com/SJTU-LIT/ceval/tree/main)或查阅我们的[学术论文](https://arxiv.org/abs/2305.08322)以获取更多详细信息。
每个学科包含三个子集:开发集(dev)、验证集(val)与测试集(test)。单学科的开发集包含5道附带解释的示例样本,用于少样本(Few-shot)评测。验证集用于超参数调优,测试集则用于模型性能评估。测试集的标签未对外公开,用户需提交模型推理结果以自动获取测试集准确率。[如何提交?](https://github.com/SJTU-LIT/ceval/tree/main#how-to-submit)
### 数据加载
python
from modelscope.msdatasets import MsDataset
ds = MsDataset.load('opencompass/ceval-exam', subset_name="computer_network")
print(dataset['val'][0])
# {'id': 0, 'question': '使用位填充方法,以01111110为位首flag,数据为011011111111111111110010,求问传送时要添加几个0____', 'A': '1', 'B': '2', 'C': '3', 'D': '4', 'answer': 'C', 'explanation': ''}
更多关于数据加载与使用的细节,请参见我们的[GitHub页面](https://github.com/SJTU-LIT/ceval#data)。
若您使用本数据集,请引用我们的学术论文:
@article{huang2023ceval,
title={C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models},
author={Huang, Yuzhen and Bai, Yuzhuo and Zhu, Zhihao and Zhang, Junlei and Zhang, Jinghan and Su, Tangjun and Liu, Junteng and Lv, Chuancheng and Zhang, Yikai and Lei, Jiayi and Fu, Yao and Sun, Maosong and He, Junxian},
journal={arXiv preprint arXiv:2305.08322},
year={2023}
}
提供机构:
maas
创建时间:
2024-05-12
搜集汇总
数据集介绍

背景与挑战
背景概述
C-Eval是一个全面的中文基础模型评估套件,包含13948道多项选择题,覆盖52个学科和四个难度级别,旨在全面评估模型能力。数据集分为dev、val和test三个子集,分别用于少样本评估、超参数调优和模型评估,其中测试集标签未公开,需通过提交结果自动获取准确率。
以上内容由遇见数据集搜集并总结生成



