Multi-SWE-bench

Name: Multi-SWE-bench
Creator: Multi-SWE-bench contributors
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/multi-swe-bench/Multi-SWE-bench

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个涵盖Java、TypeScript、JavaScript、Go、Rust、C和C++等多种编程语言的问题解决基准测试，包含了1,632个高质量实例。这些实例经过68位专家标注人员的精心标注，旨在评估大型语言模型在多样化软件生态系统中的表现。该数据集的规模为1,632个实例，任务重点在于问题解决。

This dataset is a problem-solving benchmark covering multiple programming languages including Java, TypeScript, JavaScript, Go, Rust, C, and C++, comprising 1,632 high-quality instances. These instances were meticulously annotated by 68 expert annotators, aiming to evaluate the performance of large language models (LLMs) across diverse software ecosystems. The dataset totals 1,632 instances, with its tasks focusing on problem-solving.

提供机构：

Multi-SWE-bench contributors

5,000+

优质数据集

54 个

任务类型

进入经典数据集