five

SWE-bench_oracle_llama

收藏
魔搭社区2025-11-27 更新2025-08-16 收录
下载链接:
https://modelscope.cn/datasets/princeton-nlp/SWE-bench_oracle_llama
下载链接
链接失效反馈
官方服务:
资源简介:
### Dataset Summary SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. ### Supported Tasks and Leaderboards SWE-bench proposes a new task: issue resolution provided a full repository and GitHub issue. The leaderboard can be found at www.swebench.com ### Languages The text of the dataset is primarily English, but we make no effort to filter or otherwise clean based on language type. ## Dataset Structure ### Data Instances An example of a SWE-bench datum is as follows: ``` instance_id: (str) - A formatted instance identifier, usually as repo_owner__repo_name-PR-number. patch: (str) - The gold patch, the patch generated by the PR (minus test-related code), that resolved the issue. repo: (str) - The repository owner/name identifier from GitHub. base_commit: (str) - The commit hash of the repository representing the HEAD of the repository before the solution PR is applied. hints_text: (str) - Comments made on the issue prior to the creation of the solution PR’s first commit creation date. created_at: (str) - The creation date of the pull request. test_patch: (str) - A test-file patch that was contributed by the solution PR. problem_statement: (str) - The issue title and body. version: (str) - Installation version to use for running evaluation. environment_setup_commit: (str) - commit hash to use for environment setup and installation. FAIL_TO_PASS: (str) - A json list of strings that represent the set of tests resolved by the PR and tied to the issue resolution. PASS_TO_PASS: (str) - A json list of strings that represent tests that should pass before and after the PR application. text: (str) - The generated text according to the retrieval criterion and the style-2 prompt found in [github:SWE-bench](https://github.com/princeton-nlp/SWE-bench). input_ids: (List[int]) - The llama tokens for each text. ``` [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

### 数据集概述 SWE-bench是一个用于测试系统自动解决GitHub议题能力的数据集。该数据集从12个热门Python(Python)项目中收集了2294条议题-拉取请求(Issue-Pull Request)对。评估采用单元测试(unit test)验证方式,以拉取请求(Pull Request,下文简称PR)合并后的行为作为参考解决方案。 ### 支持任务与排行榜 SWE-bench提出了一项全新任务:在提供完整代码仓库与GitHub议题的前提下完成议题修复。相关排行榜可通过网址www.swebench.com访问。 ### 语言类型 本数据集的文本主体为英语,且未针对语言类型进行过滤或其他清理操作。 ## 数据集结构 ### 数据实例 SWE-bench的单条数据示例如下: instance_id: (str) - 格式化后的实例标识符,通常采用repo_owner__repo_name-PR-number格式。 patch: (str) - 黄金补丁(gold patch),即由拉取请求生成的补丁(不含测试相关代码),用于解决对应议题。 repo: (str) - GitHub上的仓库所有者/名称标识符。 base_commit: (str) - 仓库的提交哈希值(commit hash),代表解决方案拉取请求应用前的仓库HEAD版本。 hints_text: (str) - 在解决方案拉取请求的首次提交创建日期之前,发布在该议题下的评论内容。 created_at: (str) - 拉取请求的创建日期。 test_patch: (str) - 由解决方案拉取请求贡献的测试文件补丁。 problem_statement: (str) - 议题的标题与正文内容。 version: (str) - 运行评估所需的安装版本。 environment_setup_commit: (str) - 用于环境搭建与安装的提交哈希值(commit hash)。 FAIL_TO_PASS: (str) - 由JSON格式字符串组成的列表,代表该拉取请求所解决且与议题修复相关的测试集合。 PASS_TO_PASS: (str) - 由JSON格式字符串组成的列表,代表在拉取请求应用前后均应通过的测试集合。 text: (str) - 依据检索准则与[github:SWE-bench](https://github.com/princeton-nlp/SWE-bench)中提及的样式2提示生成的文本。 input_ids: (List[int]) - 对应每条文本的Llama Token(Token)序列。 [更多信息待补充](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
maas
创建时间:
2025-08-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作