SWE-bench_Pro

Name: SWE-bench_Pro
Creator: maas
Published: 2026-05-16 22:17:22
License: 暂无描述

魔搭社区2026-05-16 更新2025-09-27 收录

下载链接：

https://modelscope.cn/datasets/ScaleAI/SWE-bench_Pro

下载链接

链接失效反馈

官方服务：

资源简介：

## Dataset Summary SWE-Bench Pro is a challenging, enterprise-level dataset for testing agent ability on long-horizon software engineering tasks. Paper: https://static.scale.com/uploads/654197dc94d34f66c0f5184e/SWEAP_Eval_Scale%20(9).pdf See the related evaluation Github: https://github.com/scaleapi/SWE-bench_Pro-os ## Dataset Structure We follow SWE-Bench Verified (https://huggingface.co/datasets/SWE-bench/SWE-bench_Verified) in terms of dataset structure, with several extra fields. ## Data Fields repo (string): Repository identifier - one of 11 repository classes instance_id (string): Unique identifier for each instance (65-120 characters) base_commit (string): Git commit hash of the base version (40 characters) patch (string): The golden code patch/diff (1.44k - 180k characters) test_patch (string): Test cases related to the patch (325 - 322k characters) problem_statement (string): Description of the issue being addressed (419 - 8.04k characters) requirements (string): Project requirements or dependencies (124 - 6.7k characters, may be null) interface (string): API or interface specifications (1 - 12.2k characters, may be null) repo_language (string): Programming language of the repository - one of 4 language classes fail_to_pass (string): Test cases that should pass after patch application (10 - 155k characters) pass_to_pass (string): Test cases that should continue passing (2 - 532k characters) issue_specificity (string): Specificity of the issue (12-77 characters) issue_categories (string): Categories or tags for the issue type before_repo_set_cmd (string): Repo set command for testing selected_test_files_to_run (string): Files selected for testing

## 数据集概述 SWE-Bench Pro是一款用于测试AI智能体（AI Agent）在长周期软件工程任务上能力的高难度企业级数据集。相关论文：https://static.scale.com/uploads/654197dc94d34f66c0f5184e/SWEAP_Eval_Scale%20(9).pdf 相关评估代码仓库：https://github.com/scaleapi/SWE-bench_Pro-os ## 数据集结构本数据集在结构上遵循SWE-Bench Verified（https://huggingface.co/datasets/SWE-bench/SWE-bench_Verified），并新增了若干字段。 ## 数据字段 - 仓库标识符（string类型）：共涵盖11类仓库类别，本字段为其中之一。 - 实例唯一标识符（string类型）：每个实例的唯一标识，长度为65至120个字符。 - 基础版本Git提交哈希值（string类型）：基础版本的Git提交哈希，长度固定为40个字符。 - 标准代码补丁/差异文件（string类型）：官方标准答案代码补丁或差异内容，长度为1.44k至180k个字符。 - 补丁关联测试用例（string类型）：与该补丁相关的测试用例，长度为325至322k个字符。 - 问题描述（string类型）：待解决问题的详细说明，长度为419至8.04k个字符。 - 项目依赖与需求（string类型）：项目所需的依赖项或需求说明，长度为124至6.7k个字符，可为空。 - 接口规范（string类型）：API或系统接口的详细规范，长度为1至12.2k个字符，可为空。 - 仓库编程语言（string类型）：该仓库使用的编程语言，共涵盖4类编程语言，本字段为其中之一。 - 补丁生效后应通过的测试用例（string类型）：应用补丁后需验证通过的测试用例，长度为10至155k个字符。 - 补丁生效后应保持通过的测试用例（string类型）：应用补丁后仍需持续通过的原有测试用例，长度为2至532k个字符。 - 问题特异性（string类型）：待解决问题的具体程度，长度为12至77个字符。 - 问题分类标签（string类型）：用于标识问题类型的分类或标签。 - 仓库前置配置命令（string类型）：用于测试前配置仓库的命令。 - 待测试选中文件（string类型）：被选中用于执行测试的文件列表。

提供机构：

maas

创建时间：

2025-09-22

5,000+

优质数据集

54 个

任务类型

进入经典数据集