Kwaipilot/SWE-Compass

Name: Kwaipilot/SWE-Compass
Creator: Kwaipilot
Published: 2025-12-24 07:23:16
License: 暂无描述

Hugging Face2025-12-24 更新2026-01-03 收录

下载链接：

https://hf-mirror.com/datasets/Kwaipilot/SWE-Compass

下载链接

链接失效反馈

官方服务：

资源简介：

SWE-Compass是一个用于评估大型语言模型（LLMs）软件工程能力的统一评估框架。它通过涵盖广泛的任务类型、编程场景和语言，解决了当前评估的局限性。数据集包含2000个来自真实GitHub拉取请求的高质量实例，并支持跨任务类型、场景和语言的多维性能比较。SWE-Compass集成了异构代码任务与真实工程实践，为诊断和改进大型语言模型的软件工程能力提供了可重复、严格且面向生产的基准。

SWE-Compass is a unified evaluation framework for assessing the software engineering capabilities of large language models (LLMs). It addresses the limitations of current evaluations by covering a wide range of task types, programming scenarios, and languages. The dataset contains 2000 high-quality instances sourced from real GitHub pull requests and supports multi-dimensional performance comparisons across task types, scenarios, and languages. By integrating heterogeneous code tasks with real engineering practices, SWE-Compass provides a reproducible, rigorous, and production-oriented benchmark for diagnosing and improving the software engineering capabilities of large language models.

提供机构：

Kwaipilot

5,000+

优质数据集

54 个

任务类型

进入经典数据集