five

Fsoft-AIC/SWE-EVO

收藏
Hugging Face2025-12-24 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/Fsoft-AIC/SWE-EVO
下载链接
链接失效反馈
官方服务:
资源简介:
SWE-EVO是一个专为评估AI编码代理在自主软件进化任务中性能而设计的基准测试。与专注于孤立编码问题的基准测试不同,SWE-EVO模拟了真实场景,要求代理根据高级软件需求规范(SRS)迭代进化复杂代码库。数据集基于真实的Python开源项目(如Django和NumPy)的版本历史,挑战代理进行多步骤的变更、大规模代码库导航以及跨版本的正确变更。其核心研究问题是:给定现有代码库和不断变化的需求,AI代理能否在长期交互中自主执行持续的规划、适应和进化。

SWE-EVO is a benchmark designed to evaluate AI coding agents in autonomous software evolution tasks. Unlike benchmarks that focus on isolated coding problems, SWE-EVO simulates realistic scenarios in which agents must iteratively evolve complex codebases according to high-level software requirement specifications (SRS). Using versioned histories from real Python open-source projects (such as Django and NumPy), SWE-EVO challenges agents to interpret high-level SRS, plan and implement multi-step changes, navigate large-scale repositories with thousands of files, and produce correct changes across multiple versions. The benchmark addresses the key research question: Given an existing codebase and evolving requirements, can AI agents autonomously perform sustained planning, adaptation, and evolution over long interactions?
提供机构:
Fsoft-AIC
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作