five

ceval

收藏
魔搭社区2026-05-24 更新2025-08-16 收录
下载链接:
https://modelscope.cn/datasets/evalscope/ceval
下载链接
链接失效反馈
官方服务:
资源简介:
C-Eval is a comprehensive Chinese evaluation suite for foundation models. It consists of 13948 multi-choice questions spanning 52 diverse disciplines and four difficulty levels. Please visit our [website](https://cevalbenchmark.com/) and [GitHub](https://github.com/SJTU-LIT/ceval/tree/main) or check our [paper](https://arxiv.org/abs/2305.08322) for more details. Each subject consists of three splits: dev, val, and test. The dev set per subject consists of five exemplars with explanations for few-shot evaluation. The val set is intended to be used for hyperparameter tuning. And the test set is for model evaluation. #### [2025.7.27] We have released the complete C-Eval test set to the community! Now, you can directly evaluate on the C-Eval test set more conveniently. ### Load the data ```python from datasets import load_dataset dataset=load_dataset(r"ceval/ceval-exam",name="computer_network") print(dataset['val'][0]) # {'id': 0, 'question': '使用位填充方法,以01111110为位首flag,数据为011011111111111111110010,求问传送时要添加几个0____', 'A': '1', 'B': '2', 'C': '3', 'D': '4', 'answer': 'C', 'explanation': ''} ``` More details on loading and using the data are at our [github page](https://github.com/SJTU-LIT/ceval#data). Please cite our paper if you use our dataset. ``` @article{huang2023ceval, title={C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models}, author={Huang, Yuzhen and Bai, Yuzhuo and Zhu, Zhihao and Zhang, Junlei and Zhang, Jinghan and Su, Tangjun and Liu, Junteng and Lv, Chuancheng and Zhang, Yikai and Lei, Jiayi and Fu, Yao and Sun, Maosong and He, Junxian}, journal={arXiv preprint arXiv:2305.08322}, year={2023} } ```

# C-Eval (Test-Split) in Unified JSONL Format ## Dataset Description This project converts the `test` split of the [C-Eval dataset](https://huggingface.co/datasets/ceval/ceval-exam) into a **unified instruction-style JSONL format** to facilitate the evaluation and testing of Large Language Models (LLMs). C-Eval is a comprehensive Chinese foundational model evaluation suite designed to measure the capabilities of language models in Chinese language and knowledge. The data in this repository is sourced from the `test` split of the original `ceval` dataset, and adopts exactly the same processing pipeline and data structure as the CMMLU dataset, enabling users to conduct evaluations under a unified framework. ## Data Format The dataset is in JSONL format, where each line is a standalone JSON object. This structure is carefully designed to fit standard instruction tuning and inference workflows. Each JSON object contains the following fields: * `id`: Unique identifier of the sample. * `instruction`: Instruction text that guides the model to answer the multiple-choice question. * `choices`: A **dictionary** containing four options, with keys "A", "B", "C", and "D". * `answer`: The correct answer to the question ('A', 'B', 'C', or 'D'). **Format Example:** json { "id": "1", "instruction": "问题: 中国的首都是哪里? 请从以下选项中选择一个正确答案。", "choices": { "A": "上海", "B": "北京", "C": "广州", "D": "深圳" }, "answer": "B" } ## Reference - [ceval](https://huggingface.co/datasets/ceval/ceval-exam)
提供机构:
maas
创建时间:
2025-08-11
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作