SWE-bench_bm25_27K
收藏魔搭社区2025-11-27 更新2025-08-23 收录
下载链接:
https://modelscope.cn/datasets/princeton-nlp/SWE-bench_bm25_27K
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for "SWE-bench_bm25_27K"
### Dataset Summary
SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution.
The dataset was released as part of [SWE-bench: Can Language Models Resolve Real-World GitHub Issues?](https://arxiv.org/abs/2310.06770)
This dataset `SWE-bench_bm25_27K` includes a formatting of each instance using Pyserini's BM25 retrieval as described in the paper. The code context size limit is 27,000 `cl100k_base` tokens from the [`tiktoken`](https://github.com/openai/tiktoken) tokenization package used for OpenAI models.
The `text` column can be used directly with LMs to generate patch files.
Models are instructed to generate [`patch`](https://en.wikipedia.org/wiki/Patch_(Unix)) formatted file using the following template:
```diff
<patch>
diff
--- a/path/to/file.py
--- b/path/to/file.py
@@ -1,3 +1,3 @@
This is a test file.
-It contains several lines.
+It has been modified.
This is the third line.
</patch>
```
This format can be used directly with the [SWE-bench inference scripts](https://github.com/princeton-nlp/SWE-bench/tree/main/inference). Please refer to these scripts for more details on inference.
### Supported Tasks and Leaderboards
SWE-bench proposes a new task: issue resolution provided a full repository and GitHub issue. The leaderboard can be found at www.swebench.com
### Languages
The text of the dataset is primarily English, but we make no effort to filter or otherwise clean based on language type.
## Dataset Structure
### Data Instances
An example of a SWE-bench datum is as follows:
```
instance_id: (str) - A formatted instance identifier, usually as repo_owner__repo_name-PR-number.
text: (str) - The input text including instructions, the "Oracle" retrieved file, and an example of the patch format for output.
patch: (str) - The gold patch, the patch generated by the PR (minus test-related code), that resolved the issue.
repo: (str) - The repository owner/name identifier from GitHub.
base_commit: (str) - The commit hash of the repository representing the HEAD of the repository before the solution PR is applied.
hints_text: (str) - Comments made on the issue prior to the creation of the solution PR’s first commit creation date.
created_at: (str) - The creation date of the pull request.
test_patch: (str) - A test-file patch that was contributed by the solution PR.
problem_statement: (str) - The issue title and body.
version: (str) - Installation version to use for running evaluation.
environment_setup_commit: (str) - commit hash to use for environment setup and installation.
FAIL_TO_PASS: (str) - A json list of strings that represent the set of tests resolved by the PR and tied to the issue resolution.
PASS_TO_PASS: (str) - A json list of strings that represent tests that should pass before and after the PR application.
```
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
# 数据集卡片:"SWE-bench_bm25_27K"
### 数据集概述
SWE-bench是一款用于测试系统自动解决GitHub议题能力的数据集。该数据集从12个热门Python项目中收集了2294个议题-拉取请求(Issue-Pull Request)对,评估环节采用单元测试验证方式,以拉取请求(PR)合并后的行为作为参考解决方案。
该数据集作为《SWE-bench:大语言模型(Large Language Model, LLM)能否解决真实世界GitHub议题?》的配套数据发布,原论文链接:https://arxiv.org/abs/2310.06770。
本数据集`SWE-bench_bm25_27K`按照论文所述,借助Pyserini的BM25检索工具对每条样本完成格式化处理。其代码上下文大小限制为27000个`cl100k_base` Token,所用分词工具为OpenAI模型配套的`tiktoken`包。
数据集的`text`列可直接用于大语言模型以生成补丁文件。模型需遵循以下模板生成符合`patch`(Unix补丁格式,参考https://en.wikipedia.org/wiki/Patch_(Unix))规范的文件:
diff
<patch>
diff
--- a/path/to/file.py
--- b/path/to/file.py
@@ -1,3 +1,3 @@
This is a test file.
-It contains several lines.
+It has been modified.
This is the third line.
</patch>
该格式可直接配合[SWE-bench推理脚本](https://github.com/princeton-nlp/SWE-bench/tree/main/inference)使用,有关推理的更多细节请参阅上述脚本。
### 支持任务与排行榜
SWE-bench提出了一项全新任务:在提供完整代码仓库与GitHub议题的前提下完成议题修复。官方排行榜可访问网址www.swebench.com。
### 语言说明
数据集文本以英文为主,未针对语言类型进行过滤或清洗操作。
## 数据集结构
### 数据样本格式
SWE-bench的单条数据示例结构如下:
instance_id: (str) - 格式化的样本标识符,通常采用`repo_owner__repo_name-PR-number`格式。
text: (str) - 输入文本,包含任务指令、“神谕(Oracle)”检索得到的目标文件,以及输出所需的补丁格式示例。
patch: (str) - 黄金补丁(gold patch),即由拉取请求生成的(剔除测试相关代码的)有效补丁,可成功解决对应议题。
repo: (str) - GitHub平台上的仓库所有者/名称标识符。
base_commit: (str) - 应用解决方案拉取请求前,仓库HEAD分支对应的提交哈希值。
hints_text: (str) - 在解决方案拉取请求的首次提交创建日期之前,于原议题下发布的所有评论内容。
created_at: (str) - 拉取请求的创建日期。
test_patch: (str) - 由解决方案拉取请求贡献的测试文件补丁。
problem_statement: (str) - 原议题的标题与正文内容。
version: (str) - 运行评估所需的环境安装版本。
environment_setup_commit: (str) - 用于环境搭建与依赖安装的提交哈希值。
FAIL_TO_PASS: (str) - JSON格式字符串列表,表示该拉取请求所解决且与议题修复相关的测试用例集合。
PASS_TO_PASS: (str) - JSON格式字符串列表,表示在应用拉取请求前后均需保持通过状态的测试用例集合。
[更多信息待补充](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
maas
创建时间:
2025-08-16



