AceCodePair-300K

Name: AceCodePair-300K
Creator: maas
Published: 2026-01-06 16:21:37
License: 暂无描述

魔搭社区2026-01-06 更新2025-02-08 收录

下载链接：

https://modelscope.cn/datasets/TIGER-Lab/AceCodePair-300K

下载链接

链接失效反馈

官方服务：

资源简介：

# 🂡 AceCode-87K [Paper](https://arxiv.org/abs/2502.01718) | [Github](https://github.com/TIGER-AI-Lab/AceCoder) | [AceCode-87K](https://huggingface.co/datasets/TIGER-Lab/AceCode-87K) | [AceCodePair-300K](https://huggingface.co/datasets/TIGER-Lab/AceCodePair-300K) | [RM/RL Models](https://huggingface.co/collections/TIGER-Lab/acecoder-67a16011a6c7d65cad529eba) We introduce AceCoder, the first work to propose a fully automated pipeline for synthesizing large-scale reliable tests used for the reward model training and reinforcement learning in the coding scenario. To do this, we curated the dataset AceCode-87K, where we start from a seed code dataset and prompt powerful LLMs to "imagine" proper test cases for the coding question and filter the noisy ones. We sample inferences from existing coder models and compute their pass rate as the reliable and verifiable rewards for both training the reward model and conducting the reinforcement learning for coder LLM. - **This dataset is the official AceCodeRMPair-300K by constructing valid pairs using the following rule from the [TIGER-Lab/AceCode-87K](https://huggingface.co/datasets/TIGER-Lab/AceCode-87K)** - The chosen program should have at least **0.8** pass rate - The pass rate difference between the chosen and rejected program should be at least **0.4** - The rejected program should have a pass rate larger than **0** - This dataset is used to train our reward models: - [TIGER-Lab/AceCodeRM-7B](https://huggingface.co/TIGER-Lab/AceCodeRM-7B) (Use the "default" subset) - [TIGER-Lab/AceCodeRM-32B](https://huggingface.co/TIGER-Lab/AceCodeRM-32B) (Use the "32B" subset) ![https://tiger-ai-lab.github.io/AceCoder/static/images/ac_overview.png](https://tiger-ai-lab.github.io/AceCoder/static/images/ac_overview.png) ## Usage - **Direct use** ```python import datasets dataset = datasets.load_dataset("TIGER-Lab/AceCodePair-300K", split='train') ``` - **Use for RM training**: This dataset can be directly used for RM trainig with [LlamaFactory](https://github.com/hiyouga/LLaMA-Factory.git) code, where you should set the `context_messages` as the key. Please refer to our [Github Code](https://github.com/TIGER-AI-Lab/AceCoder) for details. The "default" subset is used for training the 7B Qwen Coder Instruct 2.5, whereas the "32B" subset is used to train the 32B Qwen Coder Instruct 2.5. ## Citation ```bibtex @article{AceCoder, title={AceCoder: Acing Coder RL via Automated Test-Case Synthesis}, author={Zeng, Huaye and Jiang, Dongfu and Wang, Haozhe and Nie, Ping and Chen, Xiaotong and Chen, Wenhu}, journal={ArXiv}, year={2025}, volume={abs/2207.01780} } ```

# 🂡 AceCode-87K [论文](https://arxiv.org/abs/2502.01718) | [GitHub仓库](https://github.com/TIGER-AI-Lab/AceCoder) | [AceCode-87K](https://huggingface.co/datasets/TIGER-Lab/AceCode-87K) | [AceCodePair-300K](https://huggingface.co/datasets/TIGER-Lab/AceCodePair-300K) | [奖励模型/强化学习模型集合](https://huggingface.co/collections/TIGER-Lab/acecoder-67a16011a6c7d65cad529eba) 本研究提出AceCoder，这是首个面向编码场景下奖励模型训练与强化学习任务，构建大规模可靠测试用例的全自动化流水线方案。为此我们构建了AceCode-87K数据集：首先基于种子代码数据集，通过提示高性能大语言模型（Large Language Model, LLM）为编码问题生成合理测试用例，并过滤其中存在噪声的用例；随后从现有代码模型中采样推理结果，计算其通过率，以此作为训练奖励模型以及对代码类大语言模型开展强化学习的可靠可验证奖励信号。 - **本数据集为官方发布的AceCodePair-300K，其通过以下规则从[TIGER-Lab/AceCode-87K](https://huggingface.co/datasets/TIGER-Lab/AceCode-87K)中构建有效样本对** - 选中的代码程序通过率需至少为0.8 - 选中与未选中代码程序的通过率差值需至少为0.4 - 未选中代码程序的通过率需大于0 - 本数据集用于训练我们的奖励模型： - [TIGER-Lab/AceCodeRM-7B]（需使用`default`子集） - [TIGER-Lab/AceCodeRM-32B]（需使用`32B`子集） ![https://tiger-ai-lab.github.io/AceCoder/static/images/ac_overview.png](https://tiger-ai-lab.github.io/AceCoder/static/images/ac_overview.png) ## 使用方式 - **直接使用** python import datasets dataset = datasets.load_dataset("TIGER-Lab/AceCodePair-300K", split='train') - **用于奖励模型训练**：本数据集可直接配合[LlamaFactory](https://github.com/hiyouga/LLaMA-Factory.git)代码库进行奖励模型训练，需以`context_messages`作为数据键名。详细使用方式请参考我们的[GitHub代码库](https://github.com/TIGER-AI-Lab/AceCoder)。其中`default`子集用于训练7B规模的Qwen Coder Instruct 2.5模型，`32B`子集则用于训练32B规模的Qwen Coder Instruct 2.5模型。 ## 引用格式 bibtex @article{AceCoder, title={AceCoder: Acing Coder RL via Automated Test-Case Synthesis}, author={Zeng, Huaye and Jiang, Dongfu and Wang, Haozhe and Nie, Ping and Chen, Xiaotong and Chen, Wenhu}, journal={ArXiv}, year={2025}, volume={abs/2207.01780} }

提供机构：

maas

创建时间：

2025-02-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集