下载链接：

https://modelscope.cn/datasets/TIGER-Lab/AceCode-V1.1-69K

下载链接

链接失效反馈

官方服务：

资源简介：

# 🂡 AceCode-V1.1-69K [Paper](https://arxiv.org/abs/2502.01718) | [Github](https://github.com/TIGER-AI-Lab/AceCoder) [RM/RL Models](https://huggingface.co/collections/TIGER-Lab/acecoder-67a16011a6c7d65cad529eba) We introduce AceCode-V1.1, the updated dataset to the original AceCode-87K. Each question and test cases in the dataset is rewritten by OpenAI's **o1-mini**, then filtered using Qwen Coder 2.5 32B Instruct. | Subset | Evol | Oss | Stack Python Fns | Overall | | ---------------- | ----- | ----- | ---------------- | ------- | | Before Filtering | | \# Samples | 41548 | 35933 | 69739 | 147220 | | Avg # Test Cases | 20.51 | 22.04 | 20.56 | 20.90 | | After Filtering | | \# Samples | 17047 | 17928 | 34058 | 69033 | | Avg # Test Cases | 16.84 | 19.46 | 17.52 | 17.85 | Moreover, we have trained Qwen2.5-Coder-7B-Base on this dataset with RL, and it performed well on various benchmarks: | Model Name | LiveCodeBench-v4:<br>(2023.5-2024.9) | HumanEval | HumanEval+ | MBPP | MBPP+ | BigCodeBench-Complete Full | BigCodeBench-Complete Hard | BigCodeBench-Instruct Full | BigCodeBench-Instruct Hard | | -------------------------------------- | ------------------------------------ | --------- | ---------- | ---- | ----- | -------------------------- | -------------------------- | -------------------------- | -------------------------- | | GPT-4o (0806) | 43.6 | 92.7 | 87.2 | 87.6 | 72.2 | 58.9 | 36.5 | 48.0 | 25.0 | | DeepCoder-14B-Preview | \- | \- | 92.6 | \- | \- | 49.6 | 22.3 | 38.2 | 18.2 | | Qwen2.5-Coder-7B-Base (Backbone Model) | 28.7 | 61.6 | 53.0 | 76.9 | 62.9 | 45.8 | 16.2 | 40.2 | 14.2 | | Qwen2.5-7B-Instruct | 29.0 | 81.7 | 73.2 | 79.4 | 67.7 | 45.6 | 16.9 | 38.4 | 14.2 | | Qwen2.5-Coder-7B-Instruct | 34.2 | 91.5 | 86.0 | 82.8 | 71.4 | 49.5 | 19.6 | 41.8 | 20.3 | | AceCoder-V1.1-7B | 35.7 | 88.4 | 83.5 | 84.9 | 73.0 | 53.9 | 27.0 | 41.8 | 23.0 | ![https://tiger-ai-lab.github.io/AceCoder/static/images/ac_overview.png](https://tiger-ai-lab.github.io/AceCoder/static/images/ac_overview.png) ## Data Formats - `id` (str): Unique identifier for each question - `source` (str): which dataset - `question` (str): the question - `tests` (List[str]): test cases for the question ## Usage - **Direct use** ```python import datasets dataset = datasets.load_dataset("TIGER-Lab/AceCode-V1.1-69K') ``` ## Citation ```bibtex @article{AceCoder, title={AceCoder: Acing Coder RL via Automated Test-Case Synthesis}, author={Zeng, Huaye and Jiang, Dongfu and Wang, Haozhe and Nie, Ping and Chen, Xiaotong and Chen, Wenhu}, journal={ArXiv}, year={2025}, volume={abs/2207.01780} } ```

# 🂡 AceCode-V1.1-69K [论文](https://arxiv.org/abs/2502.01718) | [GitHub仓库](https://github.com/TIGER-AI-Lab/AceCoder) | [奖励模型/强化学习模型集合](https://huggingface.co/collections/TIGER-Lab/acecoder-67a16011a6c7d65cad529eba) 本工作提出了AceCode-V1.1，即原始AceCode-87K数据集的更新版本。数据集中的每道题目与测试用例均由OpenAI的**o1-mini**重写，随后通过Qwen Coder 2.5 32B Instruct进行过滤筛选。 | 子集 | Evol | Oss | Stack Python 函数 | 总计 | | ------------------ | ----- | ----- | ---------------- | ------ | | 过滤前 | | 样本数量 | 41548 | 35933 | 69739 | 147220 | | 平均测试用例数 | 20.51 | 22.04 | 20.56 | 20.90 | | 过滤后 | | 样本数量 | 17047 | 17928 | 34058 | 69033 | | 平均测试用例数 | 16.84 | 19.46 | 17.52 | 17.85 | 此外，我们基于该数据集使用强化学习（RL）训练了Qwen2.5-Coder-7B-Base，该模型在多项基准测试中表现优异： | 模型名称 | LiveCodeBench-v4：(2023.5-2024.9) | HumanEval | HumanEval+ | MBPP | MBPP+ | BigCodeBench-完整任务集 | BigCodeBench-困难任务集 | BigCodeBench-指令型完整任务集 | BigCodeBench-指令型困难任务集 | | -------------------------------------- | ------------------------------------ | --------- | ---------- | ---- | ----- | -------------------------- | -------------------------- | -------------------------- | -------------------------- | | GPT-4o（0806版本） | 43.6 | 92.7 | 87.2 | 87.6 | 72.2 | 58.9 | 36.5 | 48.0 | 25.0 | | DeepCoder-14B预览版 | - | - | 92.6 | - | - | 49.6 | 22.3 | 38.2 | 18.2 | | Qwen2.5-Coder-7B-Base（基础骨干模型） | 28.7 | 61.6 | 53.0 | 76.9 | 62.9 | 45.8 | 16.2 | 40.2 | 14.2 | | Qwen2.5-7B-Instruct | 29.0 | 81.7 | 73.2 | 79.4 | 67.7 | 45.6 | 16.9 | 38.4 | 14.2 | | Qwen2.5-Coder-7B-Instruct | 34.2 | 91.5 | 86.0 | 82.8 | 71.4 | 49.5 | 19.6 | 41.8 | 20.3 | | AceCoder-V1.1-7B | 35.7 | 88.4 | 83.5 | 84.9 | 73.0 | 53.9 | 27.0 | 41.8 | 23.0 | ![模型性能概览图](https://tiger-ai-lab.github.io/AceCoder/static/images/ac_overview.png) ## 数据格式 - `id`（字符串类型）：每道题目的唯一标识符 - `source`（字符串类型）：题目所属的原始数据集来源 - `question`（字符串类型）：题目正文内容 - `tests`（字符串列表类型）：对应题目的测试用例集合 ## 使用方法 - **直接调用** python import datasets dataset = datasets.load_dataset("TIGER-Lab/AceCode-V1.1-69K") ## 引用格式 bibtex @article{AceCoder, title={AceCoder: Acing Coder RL via Automated Test-Case Synthesis}, author={Zeng, Huaye and Jiang, Dongfu and Wang, Haozhe and Nie, Ping and Chen, Xiaotong and Chen, Wenhu}, journal={ArXiv}, year={2025}, volume={abs/2207.01780} }

应用场景：