five

TuringEnterprises/SWE-Bench-plus-plus

收藏
Hugging Face2025-12-30 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/TuringEnterprises/SWE-Bench-plus-plus
下载链接
链接失效反馈
官方服务:
资源简介:
SWE-bench++是一个重新构想的创新性端到端评估框架,旨在解决现有软件工程评估中的痛点,并引入新功能。该数据集包含500个高质量任务,覆盖多种编程语言和仓库类型,其中80%以上的任务属于中等至高等难度。这些任务平均涉及120多行代码的修改(部分任务甚至超过1000行)和7个以上文件的编辑。数据集通过自动化管道构建,包括可扩展的源筛选、智能数据整理、基于代理的Docker化以及自动质量控制。SWE-bench++不仅为软件推理评估和训练设立了新标准,还能推广到其他更全面的软件工程任务评估。

SWE-bench++ is a reenvisioned, innovative, end-to-end evaluation framework that addresses existing evaluation pain points and introduces new capabilities. The dataset includes 500 high-quality tasks across diverse programming languages and repository types, with over 80% of tasks in the medium-to-hard difficulty range. These tasks average over 120 lines of code edited (with some exceeding 1000 lines) and more than 7 files edited. The dataset is constructed through an automated pipeline involving scalable sourcing and filtering, intelligent data curation, agentic Dockerization, and automated quality control. SWE-bench++ sets a new standard for evaluating and training software reasoning capabilities and can be generalized to evaluate more holistic software engineering tasks.
提供机构:
TuringEnterprises
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作