SWE-smith-mini_swe_agent_plus-trajectories-66k
收藏魔搭社区2026-01-07 更新2025-11-15 收录
下载链接:
https://modelscope.cn/datasets/Kwai-Klear/SWE-smith-mini_swe_agent_plus-trajectories-66k
下载链接
链接失效反馈官方服务:
资源简介:
## Dataset: SWE-smith-mini_swe_agent_plus-trajectories-66k
[](https://github.com/Kwai-Klear/mini-swe-agent-plus)
[](https://huggingface.co/datasets/Kwai-Klear/SWE-smith-mini_swe_agent_plus-trajectories-66k)
[](https://huggingface.co/Kwai-Klear/Klear-AgentForge-8B-SFT)
A corpus of ~66k issue-solving trajectories collected with [mini-swe-agent-plus](https://github.com/Kwai-Klear/mini-swe-agent-plus) on issues derived from [SWE-smith](https://huggingface.co/datasets/SWE-bench/SWE-smith). Each trajectory records the agent’s end-to-end process.
<p align="left">
<img
src="https://huggingface.co/datasets/Kwai-Klear/SWE-smith-mini_swe_agent_plus-trajectories-66k/resolve/main/swe_bench_scaling_grid.svg"
width="600"
alt="SWE-bench scaling grid"
/>
</p>
We training the Qwen3-8B model on different sizes of the training data. The results are shown in the figure, it could be observed that the solve rate on SWE-bench Verified improves approximately linearly with the logarithm of the data scale (1k → 66k trajectories). Klear-Agent-8B (trained on this dataset with mini-swe-agent-plus) signifanctly outperforms other ~8B models and matches several open 32B systems.
| Method/Model | Params | Agent Framework | SWE-bench Verified (%) |
|-------------------------|:------:|---------------------|:----------------------:|
| SWE-agent-LM-7B | 7B | SWE-agent | 15.2 |
| SWE-Mirror-LM-7B | 7B | OpenHands | 22.8 |
| SWE-gym-32B | 32B | OpenHands | 20.6 |
| Skywork-SWE-32B | 32B | OpenHands | 38.0 |
| DeepSWE-32B-Preview | 32B | OpenHands | 42.2 |
| SWE-Mirror-LM-32B | 32B | OpenHands | 52.2 |
| SWE-fixer-72B | 72B | SWE-Fixer | 32.8 |
| Lingma-SWE-GPT-72B | 72B | SWE-Syninfer | 32.8 |
| **Klear-Agent-8B-SFT** | 8B | **mini-swe-agent-plus** | **39.0** |
### Load with 🤗 Datasets
```python
from datasets import load_dataset
ds = load_dataset(
"Kwai-Klear/SWE-smith-mini_swe_agent_plus-trajectories-66k",
split="train"
)
print(ds)
print(ds[0].keys())
```
## 数据集:SWE-smith-mini_swe_agent_plus-trajectories-66k
[](https://github.com/Kwai-Klear/mini-swe-agent-plus)
[](https://huggingface.co/datasets/Kwai-Klear/SWE-smith-mini_swe_agent_plus-trajectories-66k)
[](https://huggingface.co/Kwai-Klear/Klear-AgentForge-8B-SFT)
该语料库包含约6.6万条问题求解轨迹,由mini-swe-agent-plus在源自SWE-smith的问题上采集得到,每条轨迹完整记录了智能体的端到端求解流程。
<p align="left">
<img
src="https://huggingface.co/datasets/Kwai-Klear/SWE-smith-mini_swe_agent_plus-trajectories-66k/resolve/main/swe_bench_scaling_grid.svg"
width="600"
alt="SWE-bench 缩放网格图"
/>
</p>
我们针对不同规模的训练集,对Qwen3-8B模型开展了训练,实验结果如图所示。可以观察到,在SWE-bench Verified基准上的求解准确率随数据规模(1k→66k条轨迹)的对数呈近似线性提升趋势。基于该数据集结合mini-swe-agent-plus训练得到的Klear-Agent-8B模型,性能显著优于其他约8B参数的模型,且可与多款开源32B参数模型系统相媲美。
| 方法/模型 | 参数规模 | 智能体框架 | SWE-bench Verified 准确率 (%) |
|:-------------------------|:------:|:---------------------|:----------------------:|
| SWE-agent-LM-7B | 7B | SWE-agent | 15.2 |
| SWE-Mirror-LM-7B | 7B | OpenHands | 22.8 |
| SWE-gym-32B | 32B | OpenHands | 20.6 |
| Skywork-SWE-32B | 32B | OpenHands | 38.0 |
| DeepSWE-32B-Preview | 32B | OpenHands | 42.2 |
| SWE-Mirror-LM-32B | 32B | OpenHands | 52.2 |
| SWE-fixer-72B | 72B | SWE-Fixer | 32.8 |
| Lingma-SWE-GPT-72B | 72B | SWE-Syninfer | 32.8 |
| **Klear-Agent-8B-SFT** | 8B | **mini-swe-agent-plus** | **39.0** |
### 使用🤗 Datasets加载
python
from datasets import load_dataset
ds = load_dataset(
"Kwai-Klear/SWE-smith-mini_swe_agent_plus-trajectories-66k",
split="train"
)
print(ds)
print(ds[0].keys())
提供机构:
maas
创建时间:
2025-11-07



