SWE-rebench
收藏魔搭社区2025-12-18 更新2025-11-08 收录
下载链接:
https://modelscope.cn/datasets/nebius/SWE-rebench
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Summary
SWE-rebench is a large-scale dataset designed to support training and evaluation of LLM-based software engineering (SWE) agents, building upon and expanding our earlier release, [SWE-bench-extra](https://huggingface.co/datasets/nebius/SWE-bench-extra). It is constructed using a fully automated pipeline that continuously extracts real-world interactive SWE tasks from GitHub repositories at scale, as detailed in our paper [SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents](https://arxiv.org/abs/2505.20411). The dataset currently comprises over 21,000 issue–pull request pairs from 3,400+ Python repositories, each validated for correctness through automated environment setup and test execution. A curated subset of these tasks also forms the basis of our continuously updated [SWE-rebench leaderboard](https://swe-rebench.com/leaderboard).
SWE-rebench builds upon and extends the methodology of [SWE-bench](https://www.swebench.com/) by incorporating several key enhancements detailed in our paper, including:
* A fully automated pipeline for continuous task collection.
* LLM-driven extraction and validation of environment installation instructions.
* An automated LLM-based task quality assessment pipeline that annotates tasks with labels such as clarity, complexity, or test patch validity.
We’ve released 7,500 pre-built Docker images used in our RL pipeline. They’re publicly available on [Docker Hub](https://hub.docker.com/repositories/swerebench). You do not need to build them yourself.
# News
[2025/08/05] Uploaded the corresponding Docker images for 7,500 tasks to Docker Hub.
# How to Use
```python
from datasets import load_dataset
ds = load_dataset('nebius/SWE-rebench')
```
# Dataset Structure
The SWE-rebench dataset schema extends the original SWE-bench schema with additional fields to support richer analysis. The complete schema is detailed in the table below. For more information about this data and methodology behind collecting it, please refer to our paper.
| Field name | Type | Description |
|----------------------------|--------|-------------------------------------------------------------------------------------------------|
| `instance_id` | str | A formatted instance identifier, usually as `repo_owner__repo_name-PR-number`. |
| `patch` | str | The gold patch, the patch generated by the PR (minus test-related code), that resolved the issue. |
| `repo` | str | The repository owner/name identifier from GitHub. |
| `base_commit` | str | The commit hash of the repository representing the HEAD of the repository before the solution PR is applied. |
| `hints_text` | str | Comments made on the issue prior to the creation of the solution PR’s first commit creation date. |
| `created_at` | str | The creation date of the pull request. |
| `test_patch` | str | A test-file patch that was contributed by the solution PR. |
| `problem_statement` | str | The issue title and body. |
| `version` | str | Installation version to use for running evaluation. |
| `environment_setup_commit` | str | Commit hash to use for environment setup and installation. |
| `FAIL_TO_PASS` | str | A JSON list of strings that represent the set of tests resolved by the PR and tied to the issue resolution. |
| `PASS_TO_PASS` | str | A JSON list of strings that represent tests that should pass before and after the PR application. |
| `meta` | str | A JSON dictionary indicating whether the instance is lite, along with a list of failed lite validators if it is not. |
| `license_name` | str | The type of license of the repository. |
| `install_config` | str | Installation configuration for setting up the repository. |
| `requirements` | str | Freezed requirements for the repository. |
| `environment` | str | Environment configuration for the repository. |
To execute tasks from SWE-rebench (i.e., set up their environments, apply patches, and run tests), we provide a [fork](https://github.com/SWE-rebench/SWE-bench-fork) of the original SWE-bench execution framework, adapted for our dataset's structure and features.
Our fork is based on the SWE-bench framework, specifically from its `Release 4.0.3`. The primary modification introduces functionality to source environment installation constants directly from the `install_config` field present in each task instance within SWE-rebench. This allows for more flexible and task-specific environment setups.
You can find the details of this modification in the
[following commit:](https://github.com/SWE-rebench/SWE-bench-fork/commit/980d0cca8aa4e73f1d9f894e906370bef8c4de8a)
To build the necessary Docker images and run agents on SWE-rebench tasks, you have two main options:
1. **Use our SWE-bench fork directly:** Clone the fork and utilize its scripts for building images and executing tasks. The framework will automatically use the `install_config` from each task.
2. **Integrate similar functionality into your existing codebase:** If you have your own execution framework based on SWE-bench or a different system, you can adapt it by implementing a similar mechanism to parse and utilize the `install_config` field from the SWE-rebench task instances. The aforementioned commit can serve as a reference for this integration.
# License
The dataset is licensed under the Creative Commons Attribution 4.0 license. However, please respect the license of each specific repository on which a particular instance is based. To facilitate this, the license of each repository at the time of the commit is provided for every instance.
# Citation
```bibtex
@misc{badertdinov2025swerebenchautomatedpipelinetask,
title={SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents},
author={Ibragim Badertdinov and Alexander Golubev and Maksim Nekrashevich and Anton Shevtsov and Simon Karasik and Andrei Andriushchenko and Maria Trofimova and Daria Litvintseva and Boris Yangel},
year={2025},
eprint={2505.20411},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2505.20411}
}
# 数据集概览
SWE-rebench是一款大规模数据集,旨在支持基于大语言模型(Large Language Model, LLM)的软件工程(Software Engineering, SWE)AI智能体的训练与评估,其基于并拓展了我们此前发布的[SWE-bench-extra](https://huggingface.co/datasets/nebius/SWE-bench-extra)数据集。该数据集通过全自动化流水线构建,可大规模持续从GitHub仓库中提取真实世界的交互式SWE任务,详细实现方案请参见我们的论文[SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents](https://arxiv.org/abs/2505.20411)。当前该数据集包含来自3400余个Python仓库的21000余组议题-拉取请求对,所有数据均通过自动化环境搭建与测试执行完成正确性验证。其中的精选子集还构成了我们持续更新的[SWE-rebench排行榜](https://swe-rebench.com/leaderboard)的基础。
SWE-rebench基于并拓展了[SWE-bench](https://www.swebench.com/)的方法论,新增了多项关键改进,详情请参见我们的论文,具体包括:
* 持续任务收集的全自动化流水线
* 大语言模型驱动的环境安装指令提取与验证
* 基于大语言模型的自动化任务质量评估流水线,可为任务标注清晰度、复杂度、测试补丁有效性等标签
我们已发布7500个用于强化学习(Reinforcement Learning, RL)流水线的预构建Docker镜像,这些镜像可在[Docker Hub](https://hub.docker.com/repositories/swerebench)公开获取,您无需自行构建。
# 最新动态
[2025/08/05] 已将7500个任务对应的Docker镜像上传至Docker Hub。
# 使用方法
python
from datasets import load_dataset
ds = load_dataset('nebius/SWE-rebench')
# 数据集结构
SWE-rebench的数据模式在原始SWE-bench数据模式的基础上新增了多个字段,以支持更丰富的数据分析。完整的数据模式如下表所示。如需了解该数据集的更多细节与采集方法论,请参见我们的论文。
| 字段名 | 类型 | 描述 |
|----------------------------|--------|-----------------------------------------------------------------------------------------|
| `instance_id` | 字符串 | 格式化的实例标识符,格式通常为 `repo_owner__repo_name-PR-number`。 |
| `patch` | 字符串 | 黄金补丁,即该拉取请求生成的补丁(剔除与测试相关代码),用于解决对应议题。 |
| `repo` | 字符串 | GitHub上的仓库所有者/名称标识符。 |
| `base_commit` | 字符串 | 应用解决方案拉取请求前,仓库HEAD对应的提交哈希值。 |
| `hints_text` | 字符串 | 在解决方案拉取请求的首次提交创建日期之前,议题下的评论内容。 |
| `created_at` | 字符串 | 拉取请求的创建日期。 |
| `test_patch` | 字符串 | 解决方案拉取请求贡献的测试文件补丁。 |
| `problem_statement` | 字符串 | 议题的标题与正文内容。 |
| `version` | 字符串 | 运行评估时使用的安装版本。 |
| `environment_setup_commit` | 字符串 | 用于环境搭建与安装的提交哈希值。 |
| `FAIL_TO_PASS` | 字符串 | JSON格式的字符串列表,表示该拉取请求解决的测试集合,与议题解决方案相关联。 |
| `PASS_TO_PASS` | 字符串 | JSON格式的字符串列表,表示在应用拉取请求前后均应通过的测试集合。 |
| `meta` | 字符串 | JSON格式的字典,用于标记该实例是否为轻量版(lite),若非轻量版则会附带失败的轻量验证器列表。 |
| `license_name` | 字符串 | 该仓库的许可证类型。 |
| `install_config` | 字符串 | 用于搭建仓库环境的安装配置信息。 |
| `requirements` | 字符串 | 该仓库的冻结依赖版本列表。 |
| `environment` | 字符串 | 该仓库的环境配置信息。 |
若需执行SWE-rebench中的任务(即搭建对应环境、应用补丁并运行测试),我们提供了原始SWE-bench执行框架的复刻仓库([fork](https://github.com/SWE-rebench/SWE-bench-fork)),已针对该数据集的结构与特性进行适配。该复刻版本基于SWE-bench框架的`4.0.3`正式版开发,核心改进在于新增了直接从SWE-rebench每个任务实例的`install_config`字段中获取环境安装配置参数的功能,可实现更灵活且贴合任务需求的环境搭建。您可通过以下提交记录查看该改进的具体细节:[提交链接](https://github.com/SWE-rebench/SWE-bench-fork/commit/980d0cca8aa4e73f1d9f894e906370bef8c4de8a)
若需构建所需的Docker镜像并在SWE-rebench任务上运行AI智能体,您有两种主要方案:
1. **直接使用我们的SWE-bench复刻仓库**:克隆该复刻仓库并使用其内置脚本构建镜像与执行任务,框架将自动调用每个任务的`install_config`字段内容。
2. **将同类功能集成至您的现有代码库**:若您拥有基于SWE-bench或其他系统的自研执行框架,可通过实现类似机制解析并使用SWE-rebench任务实例中的`install_config`字段,上述提交记录可作为该集成过程的参考示例。
# 许可证
本数据集采用知识共享署名4.0(CC BY 4.0)许可证进行授权。但请您务必尊重每个任务实例所对应仓库的原始许可证。为便于您查阅,每个实例均附带了对应仓库在对应提交时刻的许可证信息。
# 引用
bibtex
@misc{badertdinov2025swerebenchautomatedpipelinetask,
title={SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents},
author={Ibragim Badertdinov and Alexander Golubev and Maksim Nekrashevich and Anton Shevtsov and Simon Karasik and Andrei Andriushchenko and Maria Trofimova and Daria Litvintseva and Boris Yangel},
year={2025},
eprint={2505.20411},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2505.20411}
}
提供机构:
maas
创建时间:
2025-10-28



