five

Kwaipilot/SWE-Compass

收藏
Hugging Face2025-12-24 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/Kwaipilot/SWE-Compass
下载链接
链接失效反馈
官方服务:
资源简介:
SWE-Compass是一个用于评估大型语言模型(LLMs)软件工程能力的统一评估框架。它通过涵盖广泛的任务类型、编程场景和语言,解决了当前评估的局限性。数据集包含2000个来自真实GitHub拉取请求的高质量实例,并支持跨任务类型、场景和语言的多维性能比较。SWE-Compass集成了异构代码任务与真实工程实践,为诊断和改进大型语言模型的软件工程能力提供了可重复、严格且面向生产的基准。

SWE-Compass is a unified evaluation framework for assessing the software engineering capabilities of large language models (LLMs). It addresses the limitations of current evaluations by covering a wide range of task types, programming scenarios, and languages. The dataset contains 2000 high-quality instances sourced from real GitHub pull requests and supports multi-dimensional performance comparisons across task types, scenarios, and languages. By integrating heterogeneous code tasks with real engineering practices, SWE-Compass provides a reproducible, rigorous, and production-oriented benchmark for diagnosing and improving the software engineering capabilities of large language models.
提供机构:
Kwaipilot
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作