AceCode-V1.1-69K
收藏魔搭社区2026-01-06 更新2025-05-17 收录
下载链接:
https://modelscope.cn/datasets/TIGER-Lab/AceCode-V1.1-69K
下载链接
链接失效反馈官方服务:
资源简介:
# 🂡 AceCode-V1.1-69K
[Paper](https://arxiv.org/abs/2502.01718) |
[Github](https://github.com/TIGER-AI-Lab/AceCoder)
[RM/RL Models](https://huggingface.co/collections/TIGER-Lab/acecoder-67a16011a6c7d65cad529eba)
We introduce AceCode-V1.1, the updated dataset to the original AceCode-87K. Each question and test cases in the dataset is rewritten by OpenAI's **o1-mini**, then filtered using Qwen Coder 2.5 32B Instruct.
| Subset | Evol | Oss | Stack Python Fns | Overall |
| ---------------- | ----- | ----- | ---------------- | ------- |
| Before Filtering |
| \# Samples | 41548 | 35933 | 69739 | 147220 |
| Avg # Test Cases | 20.51 | 22.04 | 20.56 | 20.90 |
| After Filtering |
| \# Samples | 17047 | 17928 | 34058 | 69033 |
| Avg # Test Cases | 16.84 | 19.46 | 17.52 | 17.85 |
Moreover, we have trained Qwen2.5-Coder-7B-Base on this dataset with RL, and it performed well on various benchmarks:
| Model Name | LiveCodeBench-v4:<br>(2023.5-2024.9) | HumanEval | HumanEval+ | MBPP | MBPP+ | BigCodeBench-Complete Full | BigCodeBench-Complete Hard | BigCodeBench-Instruct Full | BigCodeBench-Instruct Hard |
| -------------------------------------- | ------------------------------------ | --------- | ---------- | ---- | ----- | -------------------------- | -------------------------- | -------------------------- | -------------------------- |
| GPT-4o (0806) | 43.6 | 92.7 | 87.2 | 87.6 | 72.2 | 58.9 | 36.5 | 48.0 | 25.0 |
| DeepCoder-14B-Preview | \- | \- | 92.6 | \- | \- | 49.6 | 22.3 | 38.2 | 18.2 |
| Qwen2.5-Coder-7B-Base (Backbone Model) | 28.7 | 61.6 | 53.0 | 76.9 | 62.9 | 45.8 | 16.2 | 40.2 | 14.2 |
| Qwen2.5-7B-Instruct | 29.0 | 81.7 | 73.2 | 79.4 | 67.7 | 45.6 | 16.9 | 38.4 | 14.2 |
| Qwen2.5-Coder-7B-Instruct | 34.2 | 91.5 | 86.0 | 82.8 | 71.4 | 49.5 | 19.6 | 41.8 | 20.3 |
| AceCoder-V1.1-7B | 35.7 | 88.4 | 83.5 | 84.9 | 73.0 | 53.9 | 27.0 | 41.8 | 23.0 |

## Data Formats
- `id` (str): Unique identifier for each question
- `source` (str): which dataset
- `question` (str): the question
- `tests` (List[str]): test cases for the question
## Usage
- **Direct use**
```python
import datasets
dataset = datasets.load_dataset("TIGER-Lab/AceCode-V1.1-69K')
```
## Citation
```bibtex
@article{AceCoder,
title={AceCoder: Acing Coder RL via Automated Test-Case Synthesis},
author={Zeng, Huaye and Jiang, Dongfu and Wang, Haozhe and Nie, Ping and Chen, Xiaotong and Chen, Wenhu},
journal={ArXiv},
year={2025},
volume={abs/2207.01780}
}
```
# 🂡 AceCode-V1.1-69K
[论文](https://arxiv.org/abs/2502.01718) | [GitHub仓库](https://github.com/TIGER-AI-Lab/AceCoder) | [奖励模型/强化学习模型集合](https://huggingface.co/collections/TIGER-Lab/acecoder-67a16011a6c7d65cad529eba)
本工作提出了AceCode-V1.1,即原始AceCode-87K数据集的更新版本。数据集中的每道题目与测试用例均由OpenAI的**o1-mini**重写,随后通过Qwen Coder 2.5 32B Instruct进行过滤筛选。
| 子集 | Evol | Oss | Stack Python 函数 | 总计 |
| ------------------ | ----- | ----- | ---------------- | ------ |
| 过滤前 |
| 样本数量 | 41548 | 35933 | 69739 | 147220 |
| 平均测试用例数 | 20.51 | 22.04 | 20.56 | 20.90 |
| 过滤后 |
| 样本数量 | 17047 | 17928 | 34058 | 69033 |
| 平均测试用例数 | 16.84 | 19.46 | 17.52 | 17.85 |
此外,我们基于该数据集使用强化学习(RL)训练了Qwen2.5-Coder-7B-Base,该模型在多项基准测试中表现优异:
| 模型名称 | LiveCodeBench-v4:(2023.5-2024.9) | HumanEval | HumanEval+ | MBPP | MBPP+ | BigCodeBench-完整任务集 | BigCodeBench-困难任务集 | BigCodeBench-指令型完整任务集 | BigCodeBench-指令型困难任务集 |
| -------------------------------------- | ------------------------------------ | --------- | ---------- | ---- | ----- | -------------------------- | -------------------------- | -------------------------- | -------------------------- |
| GPT-4o(0806版本) | 43.6 | 92.7 | 87.2 | 87.6 | 72.2 | 58.9 | 36.5 | 48.0 | 25.0 |
| DeepCoder-14B预览版 | - | - | 92.6 | - | - | 49.6 | 22.3 | 38.2 | 18.2 |
| Qwen2.5-Coder-7B-Base(基础骨干模型) | 28.7 | 61.6 | 53.0 | 76.9 | 62.9 | 45.8 | 16.2 | 40.2 | 14.2 |
| Qwen2.5-7B-Instruct | 29.0 | 81.7 | 73.2 | 79.4 | 67.7 | 45.6 | 16.9 | 38.4 | 14.2 |
| Qwen2.5-Coder-7B-Instruct | 34.2 | 91.5 | 86.0 | 82.8 | 71.4 | 49.5 | 19.6 | 41.8 | 20.3 |
| AceCoder-V1.1-7B | 35.7 | 88.4 | 83.5 | 84.9 | 73.0 | 53.9 | 27.0 | 41.8 | 23.0 |

## 数据格式
- `id`(字符串类型):每道题目的唯一标识符
- `source`(字符串类型):题目所属的原始数据集来源
- `question`(字符串类型):题目正文内容
- `tests`(字符串列表类型):对应题目的测试用例集合
## 使用方法
- **直接调用**
python
import datasets
dataset = datasets.load_dataset("TIGER-Lab/AceCode-V1.1-69K")
## 引用格式
bibtex
@article{AceCoder,
title={AceCoder: Acing Coder RL via Automated Test-Case Synthesis},
author={Zeng, Huaye and Jiang, Dongfu and Wang, Haozhe and Nie, Ping and Chen, Xiaotong and Chen, Wenhu},
journal={ArXiv},
year={2025},
volume={abs/2207.01780}
}
提供机构:
maas
创建时间:
2025-05-12



