Multi-SWE-bench
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/multi-swe-bench/Multi-SWE-bench
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个涵盖Java、TypeScript、JavaScript、Go、Rust、C和C++等多种编程语言的问题解决基准测试,包含了1,632个高质量实例。这些实例经过68位专家标注人员的精心标注,旨在评估大型语言模型在多样化软件生态系统中的表现。该数据集的规模为1,632个实例,任务重点在于问题解决。
This dataset is a problem-solving benchmark covering multiple programming languages including Java, TypeScript, JavaScript, Go, Rust, C, and C++, comprising 1,632 high-quality instances. These instances were meticulously annotated by 68 expert annotators, aiming to evaluate the performance of large language models (LLMs) across diverse software ecosystems. The dataset totals 1,632 instances, with its tasks focusing on problem-solving.
提供机构:
Multi-SWE-bench contributors



