SWE-bench_bm25_13K

Name: SWE-bench_bm25_13K
Creator: maas
Published: 2025-12-05 16:46:29
License: 暂无描述

魔搭社区2025-12-05 更新2025-08-23 收录

下载链接：

https://modelscope.cn/datasets/princeton-nlp/SWE-bench_bm25_13K

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for "SWE-bench_bm25_13K" ### Dataset Summary SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of [SWE-bench: Can Language Models Resolve Real-World GitHub Issues?](https://arxiv.org/abs/2310.06770) This dataset `SWE-bench_bm25_13K` includes a formatting of each instance using Pyserini's BM25 retrieval as described in the paper. The code context size limit is 13,000 `cl100k_base` tokens from the [`tiktoken`](https://github.com/openai/tiktoken) tokenization package used for OpenAI models. The `text` column can be used directly with LMs to generate patch files. Models are instructed to generate [`patch`](https://en.wikipedia.org/wiki/Patch_(Unix)) formatted file using the following template: ```diff <patch> diff --- a/path/to/file.py --- b/path/to/file.py @@ -1,3 +1,3 @@ This is a test file. -It contains several lines. +It has been modified. This is the third line. </patch> ``` This format can be used directly with the [SWE-bench inference scripts](https://github.com/princeton-nlp/SWE-bench/tree/main/inference). Please refer to these scripts for more details on inference. ### Supported Tasks and Leaderboards SWE-bench proposes a new task: issue resolution provided a full repository and GitHub issue. The leaderboard can be found at www.swebench.com ### Languages The text of the dataset is primarily English, but we make no effort to filter or otherwise clean based on language type. ## Dataset Structure ### Data Instances An example of a SWE-bench datum is as follows: ``` instance_id: (str) - A formatted instance identifier, usually as repo_owner__repo_name-PR-number. text: (str) - The input text including instructions, the "Oracle" retrieved file, and an example of the patch format for output. patch: (str) - The gold patch, the patch generated by the PR (minus test-related code), that resolved the issue. repo: (str) - The repository owner/name identifier from GitHub. base_commit: (str) - The commit hash of the repository representing the HEAD of the repository before the solution PR is applied. hints_text: (str) - Comments made on the issue prior to the creation of the solution PR’s first commit creation date. created_at: (str) - The creation date of the pull request. test_patch: (str) - A test-file patch that was contributed by the solution PR. problem_statement: (str) - The issue title and body. version: (str) - Installation version to use for running evaluation. environment_setup_commit: (str) - commit hash to use for environment setup and installation. FAIL_TO_PASS: (str) - A json list of strings that represent the set of tests resolved by the PR and tied to the issue resolution. PASS_TO_PASS: (str) - A json list of strings that represent tests that should pass before and after the PR application. ``` [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

# 「SWE-bench_bm25_13K」数据集卡片 ### 数据集概述 SWE-bench是一款用于评测系统自动解决GitHub议题能力的数据集。该数据集从12个主流Python项目中收集了2294个议题-拉取请求（Issue-Pull Request）对，评估环节采用单元测试验证的方式，以拉取请求（PR）合并后的代码行为作为参考解决方案。该数据集作为论文《SWE-bench：大语言模型（Large Language Model）能否解决真实世界GitHub议题？》的一部分发布，论文链接：https://arxiv.org/abs/2310.06770。本数据集`SWE-bench_bm25_13K`按照论文所述的方式，采用Pyserini的BM25检索对每个样本进行格式化处理。其代码上下文的长度限制为13000个`cl100k_base` Token，所用的分词工具为适配OpenAI模型的[`tiktoken`](https://github.com/openai/tiktoken)分词库。其中的`text`列可直接与大语言模型配合，用于生成补丁文件。要求模型使用如下模板生成[`patch`](https://en.wikipedia.org/wiki/Patch_(Unix))格式的补丁文件： diff <patch> diff --- a/path/to/file.py --- b/path/to/file.py @@ -1,3 +1,3 @@ This is a test file. -It contains several lines. +It has been modified. This is the third line. </patch> 该格式可直接配合[SWE-bench推理脚本](https://github.com/princeton-nlp/SWE-bench/tree/main/inference)使用，如需了解推理环节的更多细节，请参考该脚本。 ### 支持任务与排行榜 SWE-bench提出了一项全新任务：在提供完整代码仓库与GitHub议题的前提下完成议题修复。相关排行榜可访问www.swebench.com查看。 ### 语言说明本数据集的文本以英文为主，且未针对语言类型进行任何过滤或清洗操作。 ## 数据集结构 ### 数据样本 SWE-bench的单条样本示例如下： instance_id: (str) - 格式化后的样本标识符，通常格式为`repo_owner__repo_name-PR-number`。 text: (str) - 输入文本，包含任务指令、「Oracle」检索到的文件以及输出补丁格式的示例。 patch: (str) - 黄金标准补丁，即由拉取请求生成的、用于解决对应议题的补丁（已剔除与测试相关的代码）。 repo: (str) - GitHub上的代码仓库所有者/名称标识符。 base_commit: (str) - 代码仓库的提交哈希值，代表解决方案拉取请求应用前的仓库HEAD状态。 hints_text: (str) - 在解决方案拉取请求的首次提交创建日期之前，针对该议题留下的评论内容。 created_at: (str) - 拉取请求的创建日期。 test_patch: (str) - 由解决方案拉取请求提交的测试文件补丁。 problem_statement: (str) - 议题的标题与正文内容。 version: (str) - 运行评估时需使用的安装版本。 environment_setup_commit: (str) - 用于环境搭建与安装的提交哈希值。 FAIL_TO_PASS: (str) - 一个JSON格式的字符串列表，代表由该拉取请求解决并与议题修复相关的测试用例集合。 PASS_TO_PASS: (str) - 一个JSON格式的字符串列表，代表在拉取请求应用前后均应通过的测试用例。 [如需更多信息](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

提供机构：

maas

创建时间：

2025-08-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集