CoderForge-Preview

Name: CoderForge-Preview
Creator: maas
Published: 2026-05-14 10:45:16
License: 暂无描述

魔搭社区2026-05-14 更新2026-03-07 收录

下载链接：

https://modelscope.cn/datasets/togethercomputer/CoderForge-Preview

下载链接

链接失效反馈

官方服务：

资源简介：

# CoderForge-Preview: SOTA Open Dataset for Training Efficient Agents **CoderForge-Preview** is **the** **largest open test-verified coding agent dataset.** Fine-tuning Qwen-3 32B on it, we boost **SWE-Bench Verified performance** **23.0% → 59.4% pass@1** and rank **#1 among open-data** and **#2 among open-weight models ≤32B parameters.** ![top_open_data_models](https://cdn-uploads.huggingface.co/production/uploads/63972847b3e2256c9ce1307b/UG0fXsbVAMxoxxuC0kRLe.png) ![top_open_weight_models](https://cdn-uploads.huggingface.co/production/uploads/63972847b3e2256c9ce1307b/y6MKx8AQTGkeG4N8kGDqd.png) ## Limitations - **Adaptability to different scaffolds:** We generated all trajectories using a **single scaffold** and **fixed tool set** (no permutations). Models trained via SFT on this data may perform worse when deployed with **different scaffolds, tools, prompt templates, or tool-call formats**. - **Task scope:** Our data sources skew toward **bug fixing**. As a result, models trained on this dataset may be less capable on tasks outside that scope, such as **feature implementation**, **refactors**, or **design-heavy changes**. - **User interaction:** Real coding-agent usage often involves **ongoing user collaboration**, with user messages appearing throughout the trajectory—not just at the start. This kind of interactive supervision is still largely missing from open coding-agent datasets (including ours). Models trained on SFT alone may therefore underperform in **interactive settings**. ## Conclusion In this release, we focus on **large-scale agentic data generation**: assembling **51K distinct open-source tasks** and generating **long-horizon, multi-step SFT trajectories**. Our results show that a simple data-generation pipeline combined with **pure SFT** can produce substantial gains in coding-agent performance. ### Next steps Moving forward, we plan to: - **Scale data generation further** (more tasks, more trajectories, longer horizons where helpful) - Generate data under **multiple scaffolds**, **tool sets**, and **prompt/tool-call permutations** to improve robustness and transfer - Train **larger models** and run more systematic **hyperparameter tuning** - Follow the **DeepSWE** training paradigm by applying **agentic reinforcement learning** on top of our fine-tuned model to drive further performance gains ## Citation ```bibtex @misc{CoderForge2026, title = {CoderForge-Preview: SOTA Open Dataset for Training Efficient Agents}, author = {Ariyak, Alpay and Zhang, Junda and Wang, Junxiong and Zhu, Shang and Bianchi, Federico and Srivastava, Sanjana and Panda, Ashwinee and Bharti, Siddhant and Xu, Chenfeng and Heo, John and Wu, Xiaoxia Shirley and Zhou, James and Liang, Percy and Song, Leon and Zhang, Ce and Athiwaratkun, Ben and Zhou, Zhongzhu and Wu, Qingyang}, year = {2026}, month = feb, publisher = {TogetherAI Blog}, url = {https://www.together.ai/blog/coderforge-preview}, note = {Project core leads: Alpay Ariyak; Zhongzhu Zhou; Qingyang Wu} } ```

# CoderForge-Preview：面向高效AI智能体训练的当前最优（State-of-the-Art，简称SOTA）开源数据集 **CoderForge-Preview** 是目前规模最大的经过测试验证的开源代码智能体（coding agent）数据集。在该数据集上对Qwen-3 32B进行微调后，我们将**SWE-Bench验证集性能**的pass@1指标从23.0%提升至59.4%，并在**开源数据集方案中位列第一**，在**参数量不超过32B的开源权重模型中位列第二**。 ![开源数据模型性能排名](https://cdn-uploads.huggingface.co/production/uploads/63972847b3e2256c9ce1307b/UG0fXsbVAMxoxxuC0kRLe.png) ![开源权重模型性能排名](https://cdn-uploads.huggingface.co/production/uploads/63972847b3e2256c9ce1307b/y6MKx8AQTGkeG4N8kGDqd.png) ## 局限性 - **不同脚手架的适配性：** 我们所有交互轨迹均采用**单一脚手架**与**固定工具集**生成（未进行排列组合）。在此数据集上通过监督微调（Supervised Fine-Tuning，简称SFT）训练的模型，在部署时若使用**不同脚手架、工具、提示词模板或工具调用格式**，性能可能会有所下降。 - **任务范围局限：** 我们的数据源更偏向**缺陷修复**任务。因此，在此数据集上训练的模型，在该范围之外的任务（如**功能实现、代码重构或重度设计类修改**）上的表现可能欠佳。 - **用户交互缺失：** 实际的代码智能体使用场景通常涉及**持续的用户协作**，用户消息会在整个交互轨迹中出现，而非仅在初始阶段。目前多数开源代码智能体数据集（包括本数据集）仍缺乏此类交互式监督数据。因此，仅通过SFT训练的模型在**交互式场景**中的性能可能不尽如人意。 ## 结论在本次发布中，我们聚焦于**大规模智能体数据生成**：整合了**51000个独立开源任务**，并生成了**长周期、多步骤的监督微调轨迹数据**。我们的实验结果表明，一套简单的数据生成流程结合**纯监督微调**方案，可显著提升代码智能体的性能表现。 ### 后续规划未来我们计划： - **进一步扩大数据生成规模**（增加任务数量、轨迹条数，并在必要时延长交互周期） - 基于**多种脚手架、工具集以及提示词/工具调用排列组合**生成数据，以提升模型的鲁棒性与迁移能力 - 训练**更大参数量的模型**，并开展更系统的**超参数调优** - 遵循**DeepSWE**训练范式，在我们微调后的模型基础上应用**智能体强化学习**，以进一步提升模型性能 ## 引用 bibtex @misc{CoderForge2026, title = {CoderForge-Preview: SOTA Open Dataset for Training Efficient Agents}, author = {Ariyak, Alpay and Zhang, Junda and Wang, Junxiong and Zhu, Shang and Bianchi, Federico and Srivastava, Sanjana and Panda, Ashwinee and Bharti, Siddhant and Xu, Chenfeng and Heo, John and Wu, Xiaoxia Shirley and Zhou, James and Liang, Percy and Song, Leon and Zhang, Ce and Athiwaratkun, Ben and Zhou, Zhongzhu and Wu, Qingyang}, year = {2026}, month = feb, publisher = {TogetherAI Blog}, url = {https://www.together.ai/blog/coderforge-preview}, note = {Project core leads: Alpay Ariyak; Zhongzhu Zhou; Qingyang Wu} }

提供机构：

maas

创建时间：

2026-02-26

5,000+

优质数据集

54 个

任务类型

进入经典数据集