inference-optimization/SWE-bench_Lite

Name: inference-optimization/SWE-bench_Lite
Creator: inference-optimization
Published: 2026-03-10 21:01:12
License: 暂无描述

Hugging Face2026-03-10 更新2026-04-05 收录

下载链接：

https://hf-mirror.com/datasets/inference-optimization/SWE-bench_Lite

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: repo dtype: string - name: instance_id dtype: string - name: base_commit dtype: string - name: patch dtype: string - name: test_patch dtype: string - name: problem_statement dtype: string - name: hints_text dtype: string - name: created_at dtype: string - name: version dtype: string - name: FAIL_TO_PASS dtype: string - name: PASS_TO_PASS dtype: string - name: environment_setup_commit dtype: string - name: image_name dtype: string splits: - name: dev num_bytes: 233879 num_examples: 23 - name: test num_bytes: 3541430 num_examples: 300 download_size: 1221577 dataset_size: 3775309 configs: - config_name: default data_files: - split: dev path: data/dev-* - split: test path: data/test-* --- ### Dataset Summary SWE-bench *Lite* is _subset_ of [SWE-bench](https://huggingface.co/datasets/princeton-nlp/SWE-bench), a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 300 test Issue-Pull Request pairs from 11 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of [SWE-bench: Can Language Models Resolve Real-World GitHub Issues?](https://arxiv.org/abs/2310.06770) ## Want to run inference now? This dataset only contains the `problem_statement` (i.e. issue text) and the `base_commit` which can represents the state of the codebase before the issue has been resolved. If you want to run inference using the "Oracle" or BM25 retrieval settings mentioned in the paper, consider the following datasets. [princeton-nlp/SWE-bench_Lite_oracle](https://huggingface.co/datasets/princeton-nlp/SWE-bench_Lite_oracle) [princeton-nlp/SWE-bench_Lite_bm25_13K](https://huggingface.co/datasets/princeton-nlp/SWE-bench_Lite_bm25_13K) [princeton-nlp/SWE-bench_Lite_bm25_27K](https://huggingface.co/datasets/princeton-nlp/SWE-bench_Lite_bm25_27K) ### Supported Tasks and Leaderboards SWE-bench proposes a new task: issue resolution provided a full repository and GitHub issue. The leaderboard can be found at www.swebench.com ### Languages The text of the dataset is primarily English, but we make no effort to filter or otherwise clean based on language type. ## Dataset Structure ### Data Instances An example of a SWE-bench datum is as follows: ``` instance_id: (str) - A formatted instance identifier, usually as repo_owner__repo_name-PR-number. patch: (str) - The gold patch, the patch generated by the PR (minus test-related code), that resolved the issue. repo: (str) - The repository owner/name identifier from GitHub. base_commit: (str) - The commit hash of the repository representing the HEAD of the repository before the solution PR is applied. hints_text: (str) - Comments made on the issue prior to the creation of the solution PR’s first commit creation date. created_at: (str) - The creation date of the pull request. test_patch: (str) - A test-file patch that was contributed by the solution PR. problem_statement: (str) - The issue title and body. version: (str) - Installation version to use for running evaluation. environment_setup_commit: (str) - commit hash to use for environment setup and installation. FAIL_TO_PASS: (str) - A json list of strings that represent the set of tests resolved by the PR and tied to the issue resolution. PASS_TO_PASS: (str) - A json list of strings that represent tests that should pass before and after the PR application. ``` [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

dataset_info: features: - name: repo（代码仓库）, dtype: string → 数据类型：字符串 - name: instance_id（实例标识符）, dtype: string → 数据类型：字符串 - name: base_commit（基准提交哈希）, dtype: string → 数据类型：字符串 - name: patch（补丁）, dtype: string → 数据类型：字符串 - name: test_patch（测试补丁）, dtype: string → 数据类型：字符串 - name: problem_statement（问题描述）, dtype: string → 数据类型：字符串 - name: hints_text（提示文本）, dtype: string → 数据类型：字符串 - name: created_at（创建时间）, dtype: string → 数据类型：字符串 - name: version（版本）, dtype: string → 数据类型：字符串 - name: FAIL_TO_PASS（失败转通过）, dtype: string → 数据类型：字符串 - name: PASS_TO_PASS（通过保持通过）, dtype: string → 数据类型：字符串 - name: environment_setup_commit（环境配置提交哈希）, dtype: string → 数据类型：字符串 - name: image_name（镜像名称）, dtype: string → 数据类型：字符串 splits: - name: dev（开发集）, num_bytes: 233879, num_examples: 23 - name: test（测试集）, num_bytes: 3541430, num_examples: 300 download_size: 1221577 dataset_size: 3775309 configs: - config_name: default（默认配置）, data_files: - split: dev, path: data/dev-* - split: test, path: data/test-* --- ### 数据集摘要 SWE-bench *Lite* 是[SWE-bench](https://huggingface.co/datasets/princeton-nlp/SWE-bench)的子集，该数据集用于测试系统自动解决GitHub议题的能力。本数据集从11个热门Python相关仓库中收集了300组测试用议题-拉取请求（Issue-Pull Request）对，评估通过单元测试验证完成，以拉取请求（PR）提交后的代码行为作为参考解决方案。该数据集作为论文[SWE-bench: Can Language Models Resolve Real-World GitHub Issues?](https://arxiv.org/abs/2310.06770)的一部分正式发布。 ## 现在想要开展推理？本数据集仅包含`problem_statement`（即议题文本）与`base_commit`（可代表议题解决前的代码库状态）。若你希望使用论文中提及的“Oracle”或BM25检索设置开展推理，请参考以下数据集： [princeton-nlp/SWE-bench_Lite_oracle](https://huggingface.co/datasets/princeton-nlp/SWE-bench_Lite_oracle) [princeton-nlp/SWE-bench_Lite_bm25_13K](https://huggingface.co/datasets/princeton-nlp/SWE-bench_Lite_bm25_13K) [princeton-nlp/SWE-bench_Lite_bm25_27K](https://huggingface.co/datasets/princeton-nlp/SWE-bench_Lite_bm25_27K) ### 支持任务与排行榜 SWE-bench 提出了一项全新任务：基于完整代码仓库与GitHub议题完成议题修复。相关排行榜可访问www.swebench.com。 ### 语言类型本数据集文本以英文为主，我们未针对语言类型进行过滤或清理操作。 ## 数据集结构 ### 数据实例 SWE-bench 的单条数据示例如下： instance_id（实例标识符）: (str) - 格式化后的实例标识符，通常采用`repo_owner__repo_name-PR-number`格式。 patch（补丁）: (str) - 由拉取请求（PR）生成的、剔除测试相关代码的黄金补丁，用于解决对应议题。 repo（代码仓库）: (str) - 来自GitHub的代码仓库所有者/名称标识符。 base_commit（基准提交哈希）: (str) - 代表解决方案PR应用前的仓库HEAD提交哈希。 hints_text（提示文本）: (str) - 指在解决方案PR的首次提交创建日期之前，议题下的评论内容。 created_at（创建时间）: (str) - 拉取请求的创建日期。 test_patch（测试补丁）: (str) - 由解决方案PR贡献的测试文件补丁。 problem_statement（问题描述）: (str) - 即议题的标题与正文内容。 version（版本）: (str) - 用于运行评估的安装版本。 environment_setup_commit（环境配置提交哈希）: (str) - 用于环境搭建与安装的提交哈希。 FAIL_TO_PASS（失败转通过）: (str) - JSON格式的字符串列表，代表由该PR解决并与议题修复相关的测试用例集合。 PASS_TO_PASS（通过保持通过）: (str) - JSON格式的字符串列表，代表在PR应用前后均应通过的测试用例。 [如需更多信息](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

提供机构：

inference-optimization

5,000+

优质数据集

54 个

任务类型

进入经典数据集