five

AceCode-V1.1-69K

收藏
魔搭社区2026-01-06 更新2025-05-17 收录
下载链接:
https://modelscope.cn/datasets/TIGER-Lab/AceCode-V1.1-69K
下载链接
链接失效反馈
官方服务:
资源简介:
# 🂡 AceCode-V1.1-69K [Paper](https://arxiv.org/abs/2502.01718) | [Github](https://github.com/TIGER-AI-Lab/AceCoder) [RM/RL Models](https://huggingface.co/collections/TIGER-Lab/acecoder-67a16011a6c7d65cad529eba) We introduce AceCode-V1.1, the updated dataset to the original AceCode-87K. Each question and test cases in the dataset is rewritten by OpenAI's **o1-mini**, then filtered using Qwen Coder 2.5 32B Instruct. | Subset | Evol | Oss | Stack Python Fns | Overall | | ---------------- | ----- | ----- | ---------------- | ------- | | Before Filtering | | \# Samples | 41548 | 35933 | 69739 | 147220 | | Avg # Test Cases | 20.51 | 22.04 | 20.56 | 20.90 | | After Filtering | | \# Samples | 17047 | 17928 | 34058 | 69033 | | Avg # Test Cases | 16.84 | 19.46 | 17.52 | 17.85 | Moreover, we have trained Qwen2.5-Coder-7B-Base on this dataset with RL, and it performed well on various benchmarks: | Model Name | LiveCodeBench-v4:<br>(2023.5-2024.9) | HumanEval | HumanEval+ | MBPP | MBPP+ | BigCodeBench-Complete Full | BigCodeBench-Complete Hard | BigCodeBench-Instruct Full | BigCodeBench-Instruct Hard | | -------------------------------------- | ------------------------------------ | --------- | ---------- | ---- | ----- | -------------------------- | -------------------------- | -------------------------- | -------------------------- | | GPT-4o (0806) | 43.6 | 92.7 | 87.2 | 87.6 | 72.2 | 58.9 | 36.5 | 48.0 | 25.0 | | DeepCoder-14B-Preview | \- | \- | 92.6 | \- | \- | 49.6 | 22.3 | 38.2 | 18.2 | | Qwen2.5-Coder-7B-Base (Backbone Model) | 28.7 | 61.6 | 53.0 | 76.9 | 62.9 | 45.8 | 16.2 | 40.2 | 14.2 | | Qwen2.5-7B-Instruct | 29.0 | 81.7 | 73.2 | 79.4 | 67.7 | 45.6 | 16.9 | 38.4 | 14.2 | | Qwen2.5-Coder-7B-Instruct | 34.2 | 91.5 | 86.0 | 82.8 | 71.4 | 49.5 | 19.6 | 41.8 | 20.3 | | AceCoder-V1.1-7B | 35.7 | 88.4 | 83.5 | 84.9 | 73.0 | 53.9 | 27.0 | 41.8 | 23.0 | ![https://tiger-ai-lab.github.io/AceCoder/static/images/ac_overview.png](https://tiger-ai-lab.github.io/AceCoder/static/images/ac_overview.png) ## Data Formats - `id` (str): Unique identifier for each question - `source` (str): which dataset - `question` (str): the question - `tests` (List[str]): test cases for the question ## Usage - **Direct use** ```python import datasets dataset = datasets.load_dataset("TIGER-Lab/AceCode-V1.1-69K') ``` ## Citation ```bibtex @article{AceCoder, title={AceCoder: Acing Coder RL via Automated Test-Case Synthesis}, author={Zeng, Huaye and Jiang, Dongfu and Wang, Haozhe and Nie, Ping and Chen, Xiaotong and Chen, Wenhu}, journal={ArXiv}, year={2025}, volume={abs/2207.01780} } ```

# 🂡 AceCode-V1.1-69K [论文](https://arxiv.org/abs/2502.01718) | [GitHub仓库](https://github.com/TIGER-AI-Lab/AceCoder) | [奖励模型/强化学习模型集合](https://huggingface.co/collections/TIGER-Lab/acecoder-67a16011a6c7d65cad529eba) 本工作提出了AceCode-V1.1,即原始AceCode-87K数据集的更新版本。数据集中的每道题目与测试用例均由OpenAI的**o1-mini**重写,随后通过Qwen Coder 2.5 32B Instruct进行过滤筛选。 | 子集 | Evol | Oss | Stack Python 函数 | 总计 | | ------------------ | ----- | ----- | ---------------- | ------ | | 过滤前 | | 样本数量 | 41548 | 35933 | 69739 | 147220 | | 平均测试用例数 | 20.51 | 22.04 | 20.56 | 20.90 | | 过滤后 | | 样本数量 | 17047 | 17928 | 34058 | 69033 | | 平均测试用例数 | 16.84 | 19.46 | 17.52 | 17.85 | 此外,我们基于该数据集使用强化学习(RL)训练了Qwen2.5-Coder-7B-Base,该模型在多项基准测试中表现优异: | 模型名称 | LiveCodeBench-v4:(2023.5-2024.9) | HumanEval | HumanEval+ | MBPP | MBPP+ | BigCodeBench-完整任务集 | BigCodeBench-困难任务集 | BigCodeBench-指令型完整任务集 | BigCodeBench-指令型困难任务集 | | -------------------------------------- | ------------------------------------ | --------- | ---------- | ---- | ----- | -------------------------- | -------------------------- | -------------------------- | -------------------------- | | GPT-4o(0806版本) | 43.6 | 92.7 | 87.2 | 87.6 | 72.2 | 58.9 | 36.5 | 48.0 | 25.0 | | DeepCoder-14B预览版 | - | - | 92.6 | - | - | 49.6 | 22.3 | 38.2 | 18.2 | | Qwen2.5-Coder-7B-Base(基础骨干模型) | 28.7 | 61.6 | 53.0 | 76.9 | 62.9 | 45.8 | 16.2 | 40.2 | 14.2 | | Qwen2.5-7B-Instruct | 29.0 | 81.7 | 73.2 | 79.4 | 67.7 | 45.6 | 16.9 | 38.4 | 14.2 | | Qwen2.5-Coder-7B-Instruct | 34.2 | 91.5 | 86.0 | 82.8 | 71.4 | 49.5 | 19.6 | 41.8 | 20.3 | | AceCoder-V1.1-7B | 35.7 | 88.4 | 83.5 | 84.9 | 73.0 | 53.9 | 27.0 | 41.8 | 23.0 | ![模型性能概览图](https://tiger-ai-lab.github.io/AceCoder/static/images/ac_overview.png) ## 数据格式 - `id`(字符串类型):每道题目的唯一标识符 - `source`(字符串类型):题目所属的原始数据集来源 - `question`(字符串类型):题目正文内容 - `tests`(字符串列表类型):对应题目的测试用例集合 ## 使用方法 - **直接调用** python import datasets dataset = datasets.load_dataset("TIGER-Lab/AceCode-V1.1-69K") ## 引用格式 bibtex @article{AceCoder, title={AceCoder: Acing Coder RL via Automated Test-Case Synthesis}, author={Zeng, Huaye and Jiang, Dongfu and Wang, Haozhe and Nie, Ping and Chen, Xiaotong and Chen, Wenhu}, journal={ArXiv}, year={2025}, volume={abs/2207.01780} }
提供机构:
maas
创建时间:
2025-05-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作