ceval-exam

Name: ceval-exam
Creator: maas
Published: 2026-07-15 16:31:55
License: 暂无描述

魔搭社区2026-07-15 更新2024-05-15 收录

下载链接：

https://modelscope.cn/datasets/opencompass/ceval-exam

下载链接

链接失效反馈

官方服务：

资源简介：

C-Eval is a comprehensive Chinese evaluation suite for foundation models. It consists of 13948 multi-choice questions spanning 52 diverse disciplines and four difficulty levels. Please visit our [website](https://cevalbenchmark.com/) and [GitHub](https://github.com/SJTU-LIT/ceval/tree/main) or check our [paper](https://arxiv.org/abs/2305.08322) for more details. Each subject consists of three splits: dev, val, and test. The dev set per subject consists of five exemplars with explanations for few-shot evaluation. The val set is intended to be used for hyperparameter tuning. And the test set is for model evaluation. Labels on the test split are not released, users are required to submit their results to automatically obtain test accuracy. [How to submit?](https://github.com/SJTU-LIT/ceval/tree/main#how-to-submit) ### Load the data ```python from modelscope.msdatasets import MsDataset ds = MsDataset.load('opencompass/ceval-exam', subset_name="computer_network") print(dataset['val'][0]) # {'id': 0, 'question': '使用位填充方法，以01111110为位首flag，数据为011011111111111111110010，求问传送时要添加几个0____', 'A': '1', 'B': '2', 'C': '3', 'D': '4', 'answer': 'C', 'explanation': ''} ``` More details on loading and using the data are at our [github page](https://github.com/SJTU-LIT/ceval#data). Please cite our paper if you use our dataset. ``` @article{huang2023ceval, title={C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models}, author={Huang, Yuzhen and Bai, Yuzhuo and Zhu, Zhihao and Zhang, Junlei and Zhang, Jinghan and Su, Tangjun and Liu, Junteng and Lv, Chuancheng and Zhang, Yikai and Lei, Jiayi and Fu, Yao and Sun, Maosong and He, Junxian}, journal={arXiv preprint arXiv:2305.08322}, year={2023} } ```

C-Eval是一款面向基础模型的综合性中文评测套件。它包含13948道多项选择题，涵盖52个多元学科与4个难度层级。请访问我们的[官网](https://cevalbenchmark.com/)、[GitHub仓库](https://github.com/SJTU-LIT/ceval/tree/main)或查阅我们的[学术论文](https://arxiv.org/abs/2305.08322)以获取更多详细信息。每个学科包含三个子集：开发集（dev）、验证集（val）与测试集（test）。单学科的开发集包含5道附带解释的示例样本，用于少样本（Few-shot）评测。验证集用于超参数调优，测试集则用于模型性能评估。测试集的标签未对外公开，用户需提交模型推理结果以自动获取测试集准确率。[如何提交？](https://github.com/SJTU-LIT/ceval/tree/main#how-to-submit) ### 数据加载 python from modelscope.msdatasets import MsDataset ds = MsDataset.load('opencompass/ceval-exam', subset_name="computer_network") print(dataset['val'][0]) # {'id': 0, 'question': '使用位填充方法，以01111110为位首flag，数据为011011111111111111110010，求问传送时要添加几个0____', 'A': '1', 'B': '2', 'C': '3', 'D': '4', 'answer': 'C', 'explanation': ''} 更多关于数据加载与使用的细节，请参见我们的[GitHub页面](https://github.com/SJTU-LIT/ceval#data)。若您使用本数据集，请引用我们的学术论文： @article{huang2023ceval, title={C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models}, author={Huang, Yuzhen and Bai, Yuzhuo and Zhu, Zhihao and Zhang, Junlei and Zhang, Jinghan and Su, Tangjun and Liu, Junteng and Lv, Chuancheng and Zhang, Yikai and Lei, Jiayi and Fu, Yao and Sun, Maosong and He, Junxian}, journal={arXiv preprint arXiv:2305.08322}, year={2023} }

提供机构：

maas

创建时间：

2024-05-12

搜集汇总

数据集介绍