Kwaipilot/SWE-Compass
收藏Hugging Face2025-12-24 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/Kwaipilot/SWE-Compass
下载链接
链接失效反馈官方服务:
资源简介:
SWE-Compass是一个用于评估大型语言模型(LLMs)软件工程能力的统一评估框架。它通过涵盖广泛的任务类型、编程场景和语言,解决了当前评估的局限性。数据集包含2000个来自真实GitHub拉取请求的高质量实例,并支持跨任务类型、场景和语言的多维性能比较。SWE-Compass集成了异构代码任务与真实工程实践,为诊断和改进大型语言模型的软件工程能力提供了可重复、严格且面向生产的基准。
SWE-Compass is a unified evaluation framework for assessing the software engineering capabilities of large language models (LLMs). It addresses the limitations of current evaluations by covering a wide range of task types, programming scenarios, and languages. The dataset contains 2000 high-quality instances sourced from real GitHub pull requests and supports multi-dimensional performance comparisons across task types, scenarios, and languages. By integrating heterogeneous code tasks with real engineering practices, SWE-Compass provides a reproducible, rigorous, and production-oriented benchmark for diagnosing and improving the software engineering capabilities of large language models.
提供机构:
Kwaipilot



