SWE-bench

Name: SWE-bench
Creator: maas
Published: 2026-05-15 20:26:14
License: 暂无描述

魔搭社区2026-05-15 更新2024-05-15 收录

下载链接：

https://modelscope.cn/datasets/AI-ModelScope/SWE-bench

下载链接

链接失效反馈

官方服务：

资源简介：

### Dataset Summary SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of [SWE-bench: Can Language Models Resolve Real-World GitHub Issues?](https://arxiv.org/abs/2310.06770) ## Want to run inference now? This dataset only contains the `problem_statement` (i.e. issue text) and the `base_commit` which can represents the state of the codebase before the issue has been resolved. If you want to run inference using the "Oracle" or BM25 retrieval settings mentioned in the paper, consider the following datasets. [princeton-nlp/SWE-bench_oracle](https://huggingface.co/datasets/princeton-nlp/SWE-bench_oracle) [princeton-nlp/SWE-bench_bm25_13K](https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_13K) [princeton-nlp/SWE-bench_bm25_27K](https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_27K) [princeton-nlp/SWE-bench_bm25_40K](https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_40K) [princeton-nlp/SWE-bench_bm25_50k_llama](https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_50k_llama) ### Supported Tasks and Leaderboards SWE-bench proposes a new task: issue resolution provided a full repository and GitHub issue. The leaderboard can be found at www.swebench.com ### Languages The text of the dataset is primarily English, but we make no effort to filter or otherwise clean based on language type. ## Dataset Structure ### Data Instances An example of a SWE-bench datum is as follows: ``` instance_id: (str) - A formatted instance identifier, usually as repo_owner__repo_name-PR-number. patch: (str) - The gold patch, the patch generated by the PR (minus test-related code), that resolved the issue. repo: (str) - The repository owner/name identifier from GitHub. base_commit: (str) - The commit hash of the repository representing the HEAD of the repository before the solution PR is applied. hints_text: (str) - Comments made on the issue prior to the creation of the solution PR’s first commit creation date. created_at: (str) - The creation date of the pull request. test_patch: (str) - A test-file patch that was contributed by the solution PR. problem_statement: (str) - The issue title and body. version: (str) - Installation version to use for running evaluation. environment_setup_commit: (str) - commit hash to use for environment setup and installation. FAIL_TO_PASS: (str) - A json list of strings that represent the set of tests resolved by the PR and tied to the issue resolution. PASS_TO_PASS: (str) - A json list of strings that represent tests that should pass before and after the PR application. ``` [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

### 数据集概述 SWE-bench是一款用于评测系统自动解决GitHub议题（Issue）能力的数据集。该数据集从12个主流Python代码仓库中采集了2294组议题-拉取请求（Issue-Pull Request，简称PR）配对样本。其评测流程以拉取请求（PR）合并后的代码行为参考基准，通过单元测试验证完成。该数据集随论文《SWE-bench：大语言模型（Large Language Model）能否解决真实世界GitHub议题？》一同发布，论文链接：https://arxiv.org/abs/2310.06770 ### 如需运行推理？当前数据集仅包含`problem_statement`（即议题原文）与`base_commit`字段，其中`base_commit`可表征议题解决前的代码仓库状态。若需使用论文中提及的“Oracle”或BM25检索设置进行推理，请参考以下数据集： [princeton-nlp/SWE-bench_oracle](https://huggingface.co/datasets/princeton-nlp/SWE-bench_oracle) [princeton-nlp/SWE-bench_bm25_13K](https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_13K) [princeton-nlp/SWE-bench_bm25_27K](https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_27K) [princeton-nlp/SWE-bench_bm25_40K](https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_40K) [princeton-nlp/SWE-bench_bm25_50k_llama](https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_50k_llama) ### 支持任务与评测排行榜 SWE-bench提出了一项全新任务：在给定完整代码仓库与GitHub议题的前提下完成议题修复。相关评测排行榜可访问www.swebench.com查看。 ### 语言情况本数据集的文本以英文为主，但未针对语言类型进行过滤或清洗处理。 ## 数据集结构 ### 数据样本 SWE-bench的单条样本示例如下： instance_id: (str) - 格式化的样本标识符，通常格式为「仓库所有者__仓库名称-PR编号」。 patch: (str) - 金标准补丁，即解决该议题的拉取请求所生成的补丁（不含测试相关代码）。 repo: (str) - GitHub上的代码仓库所有者/名称标识符。 base_commit: (str) - 代码仓库的提交哈希值，表征解决方案PR合并前的仓库HEAD状态。 hints_text: (str) - 解决方案PR的首次提交创建日期前，在该议题下留下的所有评论内容。 created_at: (str) - 拉取请求的创建日期。 test_patch: (str) - 由解决方案PR提交的测试文件补丁。 problem_statement: (str) - 议题的标题与正文内容。 version: (str) - 运行评测时需使用的安装版本。 environment_setup_commit: (str) - 用于环境搭建与安装的提交哈希值。 FAIL_TO_PASS: (str) - JSON格式的字符串列表，代表该PR解决且与该议题修复相关的测试用例集合。 PASS_TO_PASS: (str) - JSON格式的字符串列表，代表在PR合并前后均应通过的测试用例。 [更多信息需求](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

提供机构：

maas

创建时间：

2024-04-10

5,000+

优质数据集

54 个

任务类型

进入经典数据集