下载链接：

https://modelscope.cn/datasets/AI-ModelScope/SWE-bench-extra

下载链接

链接失效反馈

官方服务：

资源简介：

*Note: This dataset has an improved and significantly larger successor: [SWE-rebench](https://huggingface.co/datasets/nebius/SWE-rebench).* # Dataset Summary SWE-bench Extra is a dataset that can be used to train or evaluate agentic systems specializing in resolving GitHub issues. It is based on the methodology used to build SWE-bench benchmark and includes 6,415 Issue-Pull Request pairs sourced from 1,988 Python repositories. # Dataset Description The SWE-bench Extra dataset supports the development of software engineering agents capable of autonomously solving GitHub issues. The data collection process, based on the SWE-bench methodology, involves the following steps: 1. **Issue and Pull Request Collection**: Issues are gathered and linked with pull requests that successfully resolve them. 2. **Filtering**: Instances are filtered based on attributes such as issue descriptions, relevant code paths, and test patches. 3. **Execution-based Validation**: The project environments are set up and tests are run to verify that they execute correctly. For a more detailed description of the data collection process, please refer to our blog post [Scaling data collection for training software engineering agents](https://nebius.com/blog/posts/scaling-data-collection-for-training-swe-agents). As an example use case of this dataset, we’ve used SWE-bench-extra instances to generate a dataset of 80,036 trajectories [`nebius/swe-agent-trajectories`](https://huggingface.co/datasets/nebius/swe-agent-trajectories). We’ve then trained an action generator model, that achieves a score of 19.2% on the subset of 50 random instances from the SWE-bench Verified benchmark, representing a 30% relative improvement over its parent model Qwen2.5-72B-Instruct, which scored 14.8%. Further augmenting the action generator with a guided search based on a critic model, also trained on this data, achieves 40.6% on the full SWE-bench Verified benchmark, which is state-of-the-art among agents using solely open-weight models. You can read more about this agent in our blog post, [“Leveraging Training and Search for Better Software Engineering Agents”](https://nebius.com/blog/posts/training-and-search-for-software-engineering-agents). # How to Use ```python from datasets import load_dataset ds = load_dataset('nebius/SWE-bench-extra') ``` # Dataset Statistics Average, 75th percentile, and maximum values characterizing various attributes of the collected instances. Statistics are micro-averaged without grouping by repository. | Data | Type | Mean | p75 | Max | |---------------|--------------------|----------|----------|-----------| | Issue text | Length (words) | 111.5 | 146 | 1,294 | | Code base | Files (Non-test) | 71.71 | 72.00 | 2,264 | | | Lines (Non-test) | 15,163.38| 13,777 | 1,039,288 | | Gold patch | Files edited | 2.6 | 3 | 7 | | | Lines edited | 56 | 76 | 300 | | Tests | Fail to Pass | 10.94 | 5 | 4,941 | | | Total | 58.5 | 49 | 7,820 | # Dataset Structure The dataset contains the following fields. It includes all fields from SWE-bench and adds a `meta` column, which indicates whether the instance meets the "lite" criteria and, if not, lists the failed validators. | Field name | Type | Description | |----------------------------|--------|-------------------------------------------------------------------------------------------------| | `instance_id` | str | A formatted instance identifier, usually as `repo_owner__repo_name-PR-number`. | | `patch` | str | The gold patch, the patch generated by the PR (minus test-related code), that resolved the issue. | | `repo` | str | The repository owner/name identifier from GitHub. | | `base_commit` | str | The commit hash of the repository representing the HEAD of the repository before the solution PR is applied. | | `hints_text` | str | Comments made on the issue prior to the creation of the solution PR’s first commit creation date. | | `created_at` | str | The creation date of the pull request. | | `test_patch` | str | A test-file patch that was contributed by the solution PR. | | `problem_statement` | str | The issue title and body. | | `version` | str | Installation version to use for running evaluation. | | `environment_setup_commit` | str | Commit hash to use for environment setup and installation. | | `FAIL_TO_PASS` | str | A JSON list of strings that represent the set of tests resolved by the PR and tied to the issue resolution. | | `PASS_TO_PASS` | str | A JSON list of strings that represent tests that should pass before and after the PR application. | | `meta` | str | A JSON dictionary indicating whether the instance is lite, along with a list of failed lite validators if it is not. | | `license` | str | The type of license of the repository. | To execute instances within SWE-bench, you need to provide a default recipe for dependency installation. The constants required for running these instances are described in this [constants.py](https://huggingface.co/datasets/nebius/SWE-bench-extra/blob/main/constants.py). # License The dataset is licensed under the Creative Commons Attribution 4.0 license. However, please respect the license of each specific repository on which a particular instance is based. To facilitate this, the license of each repository at the time of the commit is provided for every instance.

*注：本数据集已有经过改进且规模显著更大的后继版本：[SWE-rebench](https://huggingface.co/datasets/nebius/SWE-rebench)。* # 数据集概述 SWE-bench Extra 是可用于训练或评估专门用于解决GitHub议题（GitHub Issue）的智能体系统（Agentic Systems）的数据集。其构建方法沿袭了SWE-bench基准测试的方法论，共包含来自1988个Python代码仓库的6415个「议题-拉取请求（Pull Request，简称PR）」对。 # 数据集说明 SWE-bench Extra 数据集可用于研发能够自主解决GitHub议题（GitHub Issue）的软件工程智能体（Software Engineering Agents）。基于SWE-bench方法论的数据采集流程包含以下步骤： 1. **议题与拉取请求（Pull Request，简称PR）收集**：收集议题，并与成功解决该议题的拉取请求进行关联。 2. **筛选**：根据议题描述、相关代码路径以及测试补丁等属性对样本进行筛选。 3. **基于执行的验证**：搭建项目环境并运行测试，以验证流程可正常执行。如需了解数据采集流程的详细说明，请参阅我们的博客文章[《为训练软件工程智能体扩展数据采集规模》](https://nebius.com/blog/posts/scaling-data-collection-for-training-swe-agents)。本数据集的一个典型应用案例为：我们使用SWE-bench-extra的样本生成了包含80036条轨迹的数据集[`nebius/swe-agent-trajectories`](https://huggingface.co/datasets/nebius/swe-agent-trajectories)。随后我们训练了一个动作生成模型，该模型在从SWE-bench Verified基准测试中随机抽取的50个样本子集上取得了19.2%的得分，相比其父模型Qwen2.5-72B-Instruct（得分14.8%）实现了30%的相对性能提升。进一步基于在此数据集上训练的评判模型（Critic Model）引入引导式搜索后，该模型在完整的SWE-bench Verified基准测试上取得了40.6%的得分，在仅使用开源权重模型（Open-weight Models）的智能体中达到了当前最优水平。您可通过我们的博客文章[《借助训练与搜索优化软件工程智能体》](https://nebius.com/blog/posts/training-and-search-for-software-engineering-agents)了解该智能体的更多细节。 # 使用方法 python from datasets import load_dataset ds = load_dataset('nebius/SWE-bench-extra') # 数据集统计量本统计量为采集样本各项属性的平均值、75分位数与最大值，未按代码仓库分组进行微观平均。 | 数据类别 | 统计维度 | 平均值 | 75分位数 | 最大值 | |----------------|------------------------|----------|----------|------------| | 议题文本 | 长度（词数） | 111.5 | 146 | 1294 | | 代码仓库 | 文件数（非测试文件） | 71.71 | 72.00 | 2264 | | | 代码行数（非测试代码） | 15163.38 | 13777 | 1039288 | | 目标补丁 | 编辑文件数 | 2.6 | 3 | 7 | | | 编辑代码行数 | 56 | 76 | 300 | | 测试用例 | 待修复失败用例数 | 10.94 | 5 | 4941 | | | 总测试用例数 | 58.5 | 49 | 7820 | # 数据集结构本数据集包含以下字段，继承了SWE-bench的全部字段，并新增了`meta`列，用于指示该样本是否符合「轻量版」标准，若不符合则会列出未通过的验证项。 | 字段名 | 类型 | 说明 | |----------------------------|--------|------------------------------------------------------------------------------------------| | `instance_id` | 字符串 | 格式化的样本标识符，通常格式为`repo_owner__repo_name-PR-number`。 | | `patch` | 字符串 | 目标补丁，即由拉取请求生成的（剔除测试相关代码的）解决议题的补丁。 | | `repo` | 字符串 | GitHub上的代码仓库所有者/名称标识符。 | | `base_commit` | 字符串 | 代码仓库的提交哈希值，代表解决方案拉取请求应用前的仓库HEAD版本。 | | `hints_text` | 字符串 | 解决方案拉取请求的首个提交创建日期之前，在议题下留下的评论。 | | `created_at` | 字符串 | 拉取请求的创建日期。 | | `test_patch` | 字符串 | 由解决方案拉取请求贡献的测试文件补丁。 | | `problem_statement` | 字符串 | 议题的标题与正文内容。 | | `version` | 字符串 | 用于运行评估的安装版本。 | | `environment_setup_commit` | 字符串 | 用于环境搭建与安装的提交哈希值。 | | `FAIL_TO_PASS` | 字符串 | JSON格式的字符串列表，表示由拉取请求解决并与议题解决相关的测试用例集合。 | | `PASS_TO_PASS` | 字符串 | JSON格式的字符串列表，表示在拉取请求应用前后均应通过的测试用例集合。 | | `meta` | 字符串 | JSON格式的字典，用于指示该样本是否为轻量版样本，若非轻量版则会列出未通过的轻量版验证项。 | | `license` | 字符串 | 该代码仓库的许可证类型。 | 如需在SWE-bench中执行样本，您需要提供依赖安装的默认脚本。运行这些样本所需的常量已在该[constants.py](https://huggingface.co/datasets/nebius/SWE-bench-extra/blob/main/constants.py)中说明。 # 许可证本数据集采用知识共享署名4.0许可证进行授权。但请您尊重每个样本所依赖的具体代码仓库的许可证条款。为方便您查阅，每个样本对应的代码仓库在对应提交时的许可证信息均已提供。

应用场景：