ceval
收藏魔搭社区2026-05-24 更新2025-08-16 收录
下载链接:
https://modelscope.cn/datasets/evalscope/ceval
下载链接
链接失效反馈官方服务:
资源简介:
C-Eval is a comprehensive Chinese evaluation suite for foundation models. It consists of 13948 multi-choice questions spanning 52 diverse disciplines and four difficulty levels. Please visit our [website](https://cevalbenchmark.com/) and [GitHub](https://github.com/SJTU-LIT/ceval/tree/main) or check our [paper](https://arxiv.org/abs/2305.08322) for more details.
Each subject consists of three splits: dev, val, and test. The dev set per subject consists of five exemplars with explanations for few-shot evaluation. The val set is intended to be used for hyperparameter tuning. And the test set is for model evaluation.
#### [2025.7.27] We have released the complete C-Eval test set to the community! Now, you can directly evaluate on the C-Eval test set more conveniently.
### Load the data
```python
from datasets import load_dataset
dataset=load_dataset(r"ceval/ceval-exam",name="computer_network")
print(dataset['val'][0])
# {'id': 0, 'question': '使用位填充方法,以01111110为位首flag,数据为011011111111111111110010,求问传送时要添加几个0____', 'A': '1', 'B': '2', 'C': '3', 'D': '4', 'answer': 'C', 'explanation': ''}
```
More details on loading and using the data are at our [github page](https://github.com/SJTU-LIT/ceval#data).
Please cite our paper if you use our dataset.
```
@article{huang2023ceval,
title={C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models},
author={Huang, Yuzhen and Bai, Yuzhuo and Zhu, Zhihao and Zhang, Junlei and Zhang, Jinghan and Su, Tangjun and Liu, Junteng and Lv, Chuancheng and Zhang, Yikai and Lei, Jiayi and Fu, Yao and Sun, Maosong and He, Junxian},
journal={arXiv preprint arXiv:2305.08322},
year={2023}
}
```
# C-Eval (Test-Split) in Unified JSONL Format
## Dataset Description
This project converts the `test` split of the [C-Eval dataset](https://huggingface.co/datasets/ceval/ceval-exam) into a **unified instruction-style JSONL format** to facilitate the evaluation and testing of Large Language Models (LLMs).
C-Eval is a comprehensive Chinese foundational model evaluation suite designed to measure the capabilities of language models in Chinese language and knowledge. The data in this repository is sourced from the `test` split of the original `ceval` dataset, and adopts exactly the same processing pipeline and data structure as the CMMLU dataset, enabling users to conduct evaluations under a unified framework.
## Data Format
The dataset is in JSONL format, where each line is a standalone JSON object. This structure is carefully designed to fit standard instruction tuning and inference workflows.
Each JSON object contains the following fields:
* `id`: Unique identifier of the sample.
* `instruction`: Instruction text that guides the model to answer the multiple-choice question.
* `choices`: A **dictionary** containing four options, with keys "A", "B", "C", and "D".
* `answer`: The correct answer to the question ('A', 'B', 'C', or 'D').
**Format Example:**
json
{
"id": "1",
"instruction": "问题: 中国的首都是哪里?
请从以下选项中选择一个正确答案。",
"choices": {
"A": "上海",
"B": "北京",
"C": "广州",
"D": "深圳"
},
"answer": "B"
}
## Reference
- [ceval](https://huggingface.co/datasets/ceval/ceval-exam)
提供机构:
maas
创建时间:
2025-08-11
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



