five

SWE-bench-extra

收藏
魔搭社区2026-04-16 更新2024-12-28 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/SWE-bench-extra
下载链接
链接失效反馈
官方服务:
资源简介:
*Note: This dataset has an improved and significantly larger successor: [SWE-rebench](https://huggingface.co/datasets/nebius/SWE-rebench).* # Dataset Summary SWE-bench Extra is a dataset that can be used to train or evaluate agentic systems specializing in resolving GitHub issues. It is based on the methodology used to build SWE-bench benchmark and includes 6,415 Issue-Pull Request pairs sourced from 1,988 Python repositories. # Dataset Description The SWE-bench Extra dataset supports the development of software engineering agents capable of autonomously solving GitHub issues. The data collection process, based on the SWE-bench methodology, involves the following steps: 1. **Issue and Pull Request Collection**: Issues are gathered and linked with pull requests that successfully resolve them. 2. **Filtering**: Instances are filtered based on attributes such as issue descriptions, relevant code paths, and test patches. 3. **Execution-based Validation**: The project environments are set up and tests are run to verify that they execute correctly. For a more detailed description of the data collection process, please refer to our blog post [Scaling data collection for training software engineering agents](https://nebius.com/blog/posts/scaling-data-collection-for-training-swe-agents). As an example use case of this dataset, we’ve used SWE-bench-extra instances to generate a dataset of 80,036 trajectories [`nebius/swe-agent-trajectories`](https://huggingface.co/datasets/nebius/swe-agent-trajectories). We’ve then trained an action generator model, that achieves a score of 19.2% on the subset of 50 random instances from the SWE-bench Verified benchmark, representing a 30% relative improvement over its parent model Qwen2.5-72B-Instruct, which scored 14.8%. Further augmenting the action generator with a guided search based on a critic model, also trained on this data, achieves 40.6% on the full SWE-bench Verified benchmark, which is state-of-the-art among agents using solely open-weight models. You can read more about this agent in our blog post, [“Leveraging Training and Search for Better Software Engineering Agents”](https://nebius.com/blog/posts/training-and-search-for-software-engineering-agents). # How to Use ```python from datasets import load_dataset ds = load_dataset('nebius/SWE-bench-extra') ``` # Dataset Statistics Average, 75th percentile, and maximum values characterizing various attributes of the collected instances. Statistics are micro-averaged without grouping by repository. | Data | Type | Mean | p75 | Max | |---------------|--------------------|----------|----------|-----------| | Issue text | Length (words) | 111.5 | 146 | 1,294 | | Code base | Files (Non-test) | 71.71 | 72.00 | 2,264 | | | Lines (Non-test) | 15,163.38| 13,777 | 1,039,288 | | Gold patch | Files edited | 2.6 | 3 | 7 | | | Lines edited | 56 | 76 | 300 | | Tests | Fail to Pass | 10.94 | 5 | 4,941 | | | Total | 58.5 | 49 | 7,820 | # Dataset Structure The dataset contains the following fields. It includes all fields from SWE-bench and adds a `meta` column, which indicates whether the instance meets the "lite" criteria and, if not, lists the failed validators. | Field name | Type | Description | |----------------------------|--------|-------------------------------------------------------------------------------------------------| | `instance_id` | str | A formatted instance identifier, usually as `repo_owner__repo_name-PR-number`. | | `patch` | str | The gold patch, the patch generated by the PR (minus test-related code), that resolved the issue. | | `repo` | str | The repository owner/name identifier from GitHub. | | `base_commit` | str | The commit hash of the repository representing the HEAD of the repository before the solution PR is applied. | | `hints_text` | str | Comments made on the issue prior to the creation of the solution PR’s first commit creation date. | | `created_at` | str | The creation date of the pull request. | | `test_patch` | str | A test-file patch that was contributed by the solution PR. | | `problem_statement` | str | The issue title and body. | | `version` | str | Installation version to use for running evaluation. | | `environment_setup_commit` | str | Commit hash to use for environment setup and installation. | | `FAIL_TO_PASS` | str | A JSON list of strings that represent the set of tests resolved by the PR and tied to the issue resolution. | | `PASS_TO_PASS` | str | A JSON list of strings that represent tests that should pass before and after the PR application. | | `meta` | str | A JSON dictionary indicating whether the instance is lite, along with a list of failed lite validators if it is not. | | `license` | str | The type of license of the repository. | To execute instances within SWE-bench, you need to provide a default recipe for dependency installation. The constants required for running these instances are described in this [constants.py](https://huggingface.co/datasets/nebius/SWE-bench-extra/blob/main/constants.py). # License The dataset is licensed under the Creative Commons Attribution 4.0 license. However, please respect the license of each specific repository on which a particular instance is based. To facilitate this, the license of each repository at the time of the commit is provided for every instance.

*注:本数据集已有经过改进且规模显著更大的后继版本:[SWE-rebench](https://huggingface.co/datasets/nebius/SWE-rebench)。* # 数据集概述 SWE-bench Extra 是可用于训练或评估专门用于解决GitHub议题(GitHub Issue)的智能体系统(Agentic Systems)的数据集。其构建方法沿袭了SWE-bench基准测试的方法论,共包含来自1988个Python代码仓库的6415个「议题-拉取请求(Pull Request,简称PR)」对。 # 数据集说明 SWE-bench Extra 数据集可用于研发能够自主解决GitHub议题(GitHub Issue)的软件工程智能体(Software Engineering Agents)。基于SWE-bench方法论的数据采集流程包含以下步骤: 1. **议题与拉取请求(Pull Request,简称PR)收集**:收集议题,并与成功解决该议题的拉取请求进行关联。 2. **筛选**:根据议题描述、相关代码路径以及测试补丁等属性对样本进行筛选。 3. **基于执行的验证**:搭建项目环境并运行测试,以验证流程可正常执行。 如需了解数据采集流程的详细说明,请参阅我们的博客文章[《为训练软件工程智能体扩展数据采集规模》](https://nebius.com/blog/posts/scaling-data-collection-for-training-swe-agents)。 本数据集的一个典型应用案例为:我们使用SWE-bench-extra的样本生成了包含80036条轨迹的数据集[`nebius/swe-agent-trajectories`](https://huggingface.co/datasets/nebius/swe-agent-trajectories)。随后我们训练了一个动作生成模型,该模型在从SWE-bench Verified基准测试中随机抽取的50个样本子集上取得了19.2%的得分,相比其父模型Qwen2.5-72B-Instruct(得分14.8%)实现了30%的相对性能提升。进一步基于在此数据集上训练的评判模型(Critic Model)引入引导式搜索后,该模型在完整的SWE-bench Verified基准测试上取得了40.6%的得分,在仅使用开源权重模型(Open-weight Models)的智能体中达到了当前最优水平。您可通过我们的博客文章[《借助训练与搜索优化软件工程智能体》](https://nebius.com/blog/posts/training-and-search-for-software-engineering-agents)了解该智能体的更多细节。 # 使用方法 python from datasets import load_dataset ds = load_dataset('nebius/SWE-bench-extra') # 数据集统计量 本统计量为采集样本各项属性的平均值、75分位数与最大值,未按代码仓库分组进行微观平均。 | 数据类别 | 统计维度 | 平均值 | 75分位数 | 最大值 | |----------------|------------------------|----------|----------|------------| | 议题文本 | 长度(词数) | 111.5 | 146 | 1294 | | 代码仓库 | 文件数(非测试文件) | 71.71 | 72.00 | 2264 | | | 代码行数(非测试代码) | 15163.38 | 13777 | 1039288 | | 目标补丁 | 编辑文件数 | 2.6 | 3 | 7 | | | 编辑代码行数 | 56 | 76 | 300 | | 测试用例 | 待修复失败用例数 | 10.94 | 5 | 4941 | | | 总测试用例数 | 58.5 | 49 | 7820 | # 数据集结构 本数据集包含以下字段,继承了SWE-bench的全部字段,并新增了`meta`列,用于指示该样本是否符合「轻量版」标准,若不符合则会列出未通过的验证项。 | 字段名 | 类型 | 说明 | |----------------------------|--------|------------------------------------------------------------------------------------------| | `instance_id` | 字符串 | 格式化的样本标识符,通常格式为`repo_owner__repo_name-PR-number`。 | | `patch` | 字符串 | 目标补丁,即由拉取请求生成的(剔除测试相关代码的)解决议题的补丁。 | | `repo` | 字符串 | GitHub上的代码仓库所有者/名称标识符。 | | `base_commit` | 字符串 | 代码仓库的提交哈希值,代表解决方案拉取请求应用前的仓库HEAD版本。 | | `hints_text` | 字符串 | 解决方案拉取请求的首个提交创建日期之前,在议题下留下的评论。 | | `created_at` | 字符串 | 拉取请求的创建日期。 | | `test_patch` | 字符串 | 由解决方案拉取请求贡献的测试文件补丁。 | | `problem_statement` | 字符串 | 议题的标题与正文内容。 | | `version` | 字符串 | 用于运行评估的安装版本。 | | `environment_setup_commit` | 字符串 | 用于环境搭建与安装的提交哈希值。 | | `FAIL_TO_PASS` | 字符串 | JSON格式的字符串列表,表示由拉取请求解决并与议题解决相关的测试用例集合。 | | `PASS_TO_PASS` | 字符串 | JSON格式的字符串列表,表示在拉取请求应用前后均应通过的测试用例集合。 | | `meta` | 字符串 | JSON格式的字典,用于指示该样本是否为轻量版样本,若非轻量版则会列出未通过的轻量版验证项。 | | `license` | 字符串 | 该代码仓库的许可证类型。 | 如需在SWE-bench中执行样本,您需要提供依赖安装的默认脚本。运行这些样本所需的常量已在该[constants.py](https://huggingface.co/datasets/nebius/SWE-bench-extra/blob/main/constants.py)中说明。 # 许可证 本数据集采用知识共享署名4.0许可证进行授权。但请您尊重每个样本所依赖的具体代码仓库的许可证条款。为方便您查阅,每个样本对应的代码仓库在对应提交时的许可证信息均已提供。
提供机构:
maas
创建时间:
2024-12-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作