Fsoft-AIC/SWE-EVO

Name: Fsoft-AIC/SWE-EVO
Creator: Fsoft-AIC
Published: 2025-12-24 07:56:39
License: 暂无描述

Hugging Face2025-12-24 更新2026-01-03 收录

下载链接：

https://hf-mirror.com/datasets/Fsoft-AIC/SWE-EVO

下载链接

链接失效反馈

官方服务：

资源简介：

SWE-EVO是一个专为评估AI编码代理在自主软件进化任务中性能而设计的基准测试。与专注于孤立编码问题的基准测试不同，SWE-EVO模拟了真实场景，要求代理根据高级软件需求规范（SRS）迭代进化复杂代码库。数据集基于真实的Python开源项目（如Django和NumPy）的版本历史，挑战代理进行多步骤的变更、大规模代码库导航以及跨版本的正确变更。其核心研究问题是：给定现有代码库和不断变化的需求，AI代理能否在长期交互中自主执行持续的规划、适应和进化。

SWE-EVO is a benchmark designed to evaluate AI coding agents in autonomous software evolution tasks. Unlike benchmarks that focus on isolated coding problems, SWE-EVO simulates realistic scenarios in which agents must iteratively evolve complex codebases according to high-level software requirement specifications (SRS). Using versioned histories from real Python open-source projects (such as Django and NumPy), SWE-EVO challenges agents to interpret high-level SRS, plan and implement multi-step changes, navigate large-scale repositories with thousands of files, and produce correct changes across multiple versions. The benchmark addresses the key research question: Given an existing codebase and evolving requirements, can AI agents autonomously perform sustained planning, adaptation, and evolution over long interactions?

提供机构：

Fsoft-AIC

5,000+

优质数据集

54 个

任务类型

进入经典数据集