Fsoft-AIC/SWE-EVO
收藏Hugging Face2025-12-24 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/Fsoft-AIC/SWE-EVO
下载链接
链接失效反馈官方服务:
资源简介:
SWE-EVO是一个专为评估AI编码代理在自主软件进化任务中性能而设计的基准测试。与专注于孤立编码问题的基准测试不同,SWE-EVO模拟了真实场景,要求代理根据高级软件需求规范(SRS)迭代进化复杂代码库。数据集基于真实的Python开源项目(如Django和NumPy)的版本历史,挑战代理进行多步骤的变更、大规模代码库导航以及跨版本的正确变更。其核心研究问题是:给定现有代码库和不断变化的需求,AI代理能否在长期交互中自主执行持续的规划、适应和进化。
SWE-EVO is a benchmark designed to evaluate AI coding agents in autonomous software evolution tasks. Unlike benchmarks that focus on isolated coding problems, SWE-EVO simulates realistic scenarios in which agents must iteratively evolve complex codebases according to high-level software requirement specifications (SRS). Using versioned histories from real Python open-source projects (such as Django and NumPy), SWE-EVO challenges agents to interpret high-level SRS, plan and implement multi-step changes, navigate large-scale repositories with thousands of files, and produce correct changes across multiple versions. The benchmark addresses the key research question: Given an existing codebase and evolving requirements, can AI agents autonomously perform sustained planning, adaptation, and evolution over long interactions?
提供机构:
Fsoft-AIC



