AceCodePair-QwenCoderIns32B
收藏魔搭社区2025-05-22 更新2025-02-08 收录
下载链接:
https://modelscope.cn/datasets/TIGER-Lab/AceCodePair-QwenCoderIns32B
下载链接
链接失效反馈官方服务:
资源简介:
# 🂡 AceCodePair-QwenCoderIns32B
[Paper](https://arxiv.org/abs/2502.01718) |
[Github](https://github.com/TIGER-AI-Lab/AceCoder) |
[AceCode-89K](https://huggingface.co/datasets/TIGER-Lab/AceCode-89K) |
[AceCodePair-300K](https://huggingface.co/datasets/TIGER-Lab/AceCodePair-300K) |
[RM/RL Models](https://huggingface.co/collections/TIGER-Lab/acecoder-67a16011a6c7d65cad529eba)
We introduce AceCoder, the first work to propose a fully automated pipeline for synthesizing large-scale reliable tests used for the reward model training and reinforcement learning in the coding scenario. To do this, we curated the dataset AceCode-89K, where we start from a seed code dataset and prompt powerful LLMs to "imagine" proper test cases for the coding question and filter the noisy ones. We sample inferences from existing coder models and compute their pass rate as the reliable and verifiable rewards for both training the reward model and conducting the reinforcement learning for coder LLM.
- **This dataset is the constructed using the same procedure as AceCodeRMPair-300K by constructing valid pairs using the following rule from the [TIGER-Lab/AceCode-89K](https://huggingface.co/datasets/TIGER-Lab/AceCode-89K).** However, it is constructed with data from the 32B version of Qwen Coder Instruct 2.5 and similarily used to train the 32B AceCodeRM.
- The chosen program should have at least **0.8** pass rate
- The pass rate difference between the chosen and rejected program should be at least **0.4**
- The rejected program should have a pass rate larger than **0**
- This dataset is used to train our reward models:
- [TIGER-Lab/AceCodeRM-32B](https://huggingface.co/TIGER-Lab/AceCodeRM-32B)

## Usage
- **Direct use**
```python
import datasets
dataset = datasets.load_dataset("TIGER-Lab/AceCodePair-300K", split='train')
```
- **Use for RM training**: This dataset can be directly used for RM trainig with [LlamaFactory](https://github.com/hiyouga/LLaMA-Factory.git) code, where you should set the `context_messages` as the key. Please refer to our [Github Code](https://github.com/TIGER-AI-Lab/AceCoder) for details
## Q&A
If you have any questions, please feel free to shoot us an email.
# 🂡 AceCodePair-QwenCoderIns32B
[论文](https://arxiv.org/abs/2502.01718) | [GitHub仓库](https://github.com/TIGER-AI-Lab/AceCoder) | [AceCode-89K数据集](https://huggingface.co/datasets/TIGER-Lab/AceCode-89K) | [AceCodePair-300K数据集](https://huggingface.co/datasets/TIGER-Lab/AceCodePair-300K) | [奖励模型/强化学习模型集合](https://huggingface.co/collections/TIGER-Lab/acecoder-67a16011a6c7d65cad529eba)
本研究提出AceCoder,这是首个面向编码场景下奖励模型训练与强化学习任务,构建大规模可靠测试用例的全自动化流水线方案。为此,我们构建了AceCode-89K数据集:首先以种子代码数据集为起点,通过提示高性能大语言模型为编码问题生成合理测试用例,并过滤其中存在噪声的样本。我们从现有代码模型中采样推理结果,并计算其通过率,以此作为奖励模型训练与代码大语言模型强化学习阶段的可靠可验证奖励信号。
- **本数据集采用与AceCodeRMPair-300K一致的流程构建:从[TIGER-Lab/AceCode-89K](https://huggingface.co/datasets/TIGER-Lab/AceCode-89K)中依据下述规则生成有效样本对。** 不过,本数据集使用Qwen Coder Instruct 2.5的32B版本数据构建,同样用于训练32B规格的AceCodeRM模型。
- 选中的程序通过率需至少达到**0.8**
- 选中与落选程序的通过率差值需至少为**0.4**
- 落选程序的通过率需大于**0**
本数据集用于训练我们的奖励模型:
- [TIGER-Lab/AceCodeRM-32B](https://huggingface.co/TIGER-Lab/AceCodeRM-32B)

## 使用方式
- **直接使用**
python
import datasets
dataset = datasets.load_dataset("TIGER-Lab/AceCodePair-300K", split='train')
- **用于奖励模型训练**:本数据集可直接结合[LlamaFactory](https://github.com/hiyouga/LLaMA-Factory.git)代码进行奖励模型训练,需将`context_messages`作为键值。详细配置请参考我们的[GitHub代码仓库](https://github.com/TIGER-AI-Lab/AceCoder)。
## 常见问题
如有任何疑问,欢迎随时致信联系。
提供机构:
maas
创建时间:
2025-02-06



