AutoCodeBench, AutoCodeBench-Lite, AutoCodeBench-Complete
收藏AutoCodeBench 数据集概述
数据集简介
- 开发团队: 腾讯混元团队
- 核心创新: 通过LLM-Sandbox交互自动生成代码测试基准,解决人工标注耗时且难以扩展的问题
- 主要特点:
- 支持20种编程语言的平衡分布
- 包含高难度、实用性强的问题
- 具有语言多样性
数据集组成
| 数据集名称 | 问题数量 | 特点 |
|---|---|---|
| AutoCodeBench | 3,920 | 完整基准集 |
| AutoCodeBench-Lite | 1,586 | 已被至少两种模型成功解决的问题子集 |
| AutoCodeBench-Complete | 1,000 | 采用3-shot提示的补全式评估框架 |
数据字段说明
question: 编程问题描述canonical_solution: 标准代码解决方案demo_test_func: 包含基础测试用例的公共测试函数full_test_func: 包含全面测试用例的私有测试函数language: 使用的编程语言difficulty: 难度等级(easy/medium/hard)
下载地址
- AutoCodeBench: https://huggingface.co/datasets/tencent/AutoCodeBenchmark/blob/main/autocodebench.jsonl
- AutoCodeBench-Lite: https://huggingface.co/datasets/tencent/AutoCodeBenchmark/blob/main/autocodebench_lite.jsonl
- AutoCodeBench-Complete: https://huggingface.co/datasets/tencent/AutoCodeBenchmark/blob/main/autocodebench_completion_3shot.jsonl
评估方法
- 准备模型输出文件
model_output.jsonl - 拉取多语言沙箱镜像
- 启动沙箱服务
- 验证服务状态
- 计算pass@1指标
系统提示模板
"You are an expert programmer. Your task is to provide a code solution within a single Markdown code block for the given programming problem. Do not include any direct execution commands, test cases, or usage examples within the code block."
引用信息
bibtex @misc{chou2025autocodebenchlargelanguagemodels, title={AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators}, author={Jason Chou and Ao Liu and Yuchi Deng and Zhiying Zeng and Tao Zhang and Haotian Zhu and Jianwei Cai and Yue Mao and Chenchen Zhang and Lingyun Tan and Ziyan Xu and Bohui Zhai and Hengyi Liu and Speed Zhu and Wiggin Zhou and Fengzong Lian}, year={2025}, eprint={2508.09101}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2508.09101}, }
许可证
遵循项目根目录下的LICENSE文件规定




