SWE-bench_Lite
收藏魔搭社区2026-05-22 更新2025-04-19 收录
下载链接:
https://modelscope.cn/datasets/princeton-nlp/SWE-bench_Lite
下载链接
链接失效反馈官方服务:
资源简介:
### Dataset Summary
SWE-bench *Lite* is _subset_ of [SWE-bench](https://huggingface.co/datasets/princeton-nlp/SWE-bench), a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 300 test Issue-Pull Request pairs from 11 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution.
The dataset was released as part of [SWE-bench: Can Language Models Resolve Real-World GitHub Issues?](https://arxiv.org/abs/2310.06770)
## Want to run inference now?
This dataset only contains the `problem_statement` (i.e. issue text) and the `base_commit` which can represents the state of the codebase before the issue has been resolved. If you want to run inference using the "Oracle" or BM25 retrieval settings mentioned in the paper, consider the following datasets.
[princeton-nlp/SWE-bench_Lite_oracle](https://huggingface.co/datasets/princeton-nlp/SWE-bench_Lite_oracle)
[princeton-nlp/SWE-bench_Lite_bm25_13K](https://huggingface.co/datasets/princeton-nlp/SWE-bench_Lite_bm25_13K)
[princeton-nlp/SWE-bench_Lite_bm25_27K](https://huggingface.co/datasets/princeton-nlp/SWE-bench_Lite_bm25_27K)
### Supported Tasks and Leaderboards
SWE-bench proposes a new task: issue resolution provided a full repository and GitHub issue. The leaderboard can be found at www.swebench.com
### Languages
The text of the dataset is primarily English, but we make no effort to filter or otherwise clean based on language type.
## Dataset Structure
### Data Instances
An example of a SWE-bench datum is as follows:
```
instance_id: (str) - A formatted instance identifier, usually as repo_owner__repo_name-PR-number.
patch: (str) - The gold patch, the patch generated by the PR (minus test-related code), that resolved the issue.
repo: (str) - The repository owner/name identifier from GitHub.
base_commit: (str) - The commit hash of the repository representing the HEAD of the repository before the solution PR is applied.
hints_text: (str) - Comments made on the issue prior to the creation of the solution PR’s first commit creation date.
created_at: (str) - The creation date of the pull request.
test_patch: (str) - A test-file patch that was contributed by the solution PR.
problem_statement: (str) - The issue title and body.
version: (str) - Installation version to use for running evaluation.
environment_setup_commit: (str) - commit hash to use for environment setup and installation.
FAIL_TO_PASS: (str) - A json list of strings that represent the set of tests resolved by the PR and tied to the issue resolution.
PASS_TO_PASS: (str) - A json list of strings that represent tests that should pass before and after the PR application.
```
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 数据集概述
SWE-bench *Lite* 是[SWE-bench](https://huggingface.co/datasets/princeton-nlp/SWE-bench)的子集,后者是一款用于测试系统自动解决GitHub议题能力的数据集。本数据集从11个热门Python项目中采集了300组测试用的**议题-拉取请求(Issue-Pull Request,PR)**对。评估环节采用单元测试验证方式,以拉取请求(PR)合并后的代码行为作为参考解。
本数据集作为《SWE-bench:大语言模型能否解决真实世界的GitHub议题?》(原标题:*SWE-bench: Can Language Models Resolve Real-World GitHub Issues?*)https://arxiv.org/abs/2310.06770 的一部分发布。
### 是否需要运行推理?
本数据集仅包含`problem_statement`(即议题文本)与`base_commit`,其中`base_commit`可表征议题未被解决时的代码库状态。若需使用论文中提及的"Oracle"或BM25检索设置开展推理,可参考以下数据集:
[princeton-nlp/SWE-bench_Lite_oracle](https://huggingface.co/datasets/princeton-nlp/SWE-bench_Lite_oracle)
[princeton-nlp/SWE-bench_Lite_bm25_13K](https://huggingface.co/datasets/princeton-nlp/SWE-bench_Lite_bm25_13K)
[princeton-nlp/SWE-bench_Lite_bm25_27K](https://huggingface.co/datasets/princeton-nlp/SWE-bench_Lite_bm25_27K)
### 支持任务与排行榜
SWE-bench 提出了一项全新任务:基于完整仓库与GitHub议题完成议题修复。相关排行榜可访问www.swebench.com。
### 语言说明
本数据集的文本以英语为主,但未针对语言类型进行任何过滤或清洗操作。
## 数据集结构
### 数据实例
SWE-bench 的单条数据样例格式如下:
instance_id: (str) - 格式化的实例标识符,通常格式为 repo_owner__repo_name-PR-number。
patch: (str) - 黄金补丁(gold patch),即由拉取请求生成的、与议题修复相关的补丁(不含测试相关代码)。
repo: (str) - GitHub 上的仓库所有者/名称标识符。
base_commit: (str) - 仓库的提交哈希值,代表解决方案拉取请求应用前的仓库HEAD状态。
hints_text: (str) - 解决方案拉取请求首次提交创建前,在议题下留下的评论内容。
created_at: (str) - 拉取请求的创建时间。
test_patch: (str) - 由解决方案拉取请求贡献的测试文件补丁。
problem_statement: (str) - 议题的标题与正文内容。
version: (str) - 运行评估时需使用的安装版本。
environment_setup_commit: (str) - 用于环境搭建与安装的提交哈希值。
FAIL_TO_PASS: (str) - 以JSON列表格式存储的字符串集合,代表由该拉取请求修复并与议题解决相关的测试用例。
PASS_TO_PASS: (str) - 以JSON列表格式存储的字符串集合,代表在拉取请求应用前后均应通过的测试用例。
[更多信息待补充](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
maas
创建时间:
2025-08-15
搜集汇总
数据集介绍

背景与挑战
背景概述
SWE-bench_Lite是SWE-bench的子集,包含300个来自11个流行Python项目的Issue-Pull Request对,用于测试系统自动解决GitHub问题的能力。它仅提供问题描述和基础提交,通过单元测试验证PR后行为进行评估,专注于真实世界的代码修复场景。
以上内容由遇见数据集搜集并总结生成



