Multi-SWE-bench
收藏魔搭社区2026-01-06 更新2025-04-19 收录
下载链接:
https://modelscope.cn/datasets/ByteDance-Seed/Multi-SWE-bench
下载链接
链接失效反馈官方服务:
资源简介:
## 👋 Overview
This repository contains the Multi-SWE-bench dataset, introduced in [Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving](https://huggingface.co/papers/2504.02605), to address the lack of multilingual benchmarks for evaluating LLMs in real-world code issue resolution.
Unlike existing Python-centric benchmarks (e.g., SWE-bench), this framework spans 7 languages (Java, TypeScript, JavaScript, Go, Rust, C, and C++) with 1,632 high-quality instances,
curated from 2,456 candidates by 68 expert annotators for reliability. The leaderboard can be found at:
https://multi-swe-bench.github.io
## ⚙️ Usage
```bash
# Make sure git-lfs is installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/datasets/ByteDance-Seed/Multi-SWE-bench
```
## 🧩 Data Instances Structure
An example of a Multi-SWE-bench datum is as follows:
```
org: (str) - Organization name identifier from Github.
repo: (str) - Repository name identifier from Github.
number: (int) - The PR number.
state: (str) - The PR state.
title: (str) - The PR title.
body: (str) - The PR body.
base: (dict) - The target branch information of the PR
resolved_issues: (list) - A json list of strings that represent issues that resolved by PR application.
fix_patch: (str) - A fix-file patch that was contributed by the solution PR.
test_patch: (str) - A test-file patch that was contributed by the solution PR.
fixed_tests: (dict) - A json dict of strings that represent tests that should be fixed after the PR application.
p2p_tests: (dict) - The tests that should pass before and after the PR application.
f2p_tests: (dict) - The tests resolved by the PR and tied to the issue resolution.
s2p_tests: (dict) - The tests that should skip before the PR application, and pass after the PR application.
n2p_tests: (dict) - The tests that did not exist before the PR application and tests that should be passed after the PR application.
run_result: (dict) - Overall run results, including number of tests passed, number of tests failed, etc.
test_patch_result: (dict) - The result after the test patch was applied.
fix_patch_result: (dict) - The result after all the patches were applied.
instance_id: (str) - A formatted instance identifier, usually as org__repo_PR-number.
```
## 📚 Citation
```
@misc{zan2025multiswebench,
title={Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving},
author={Daoguang Zan and Zhirong Huang and Wei Liu and Hanwu Chen and Linhao Zhang and Shulin Xin and Lu Chen and Qi Liu and Xiaojian Zhong and Aoyan Li and Siyao Liu and Yongsheng Xiao and Liangqiang Chen and Yuyu Zhang and Jing Su and Tianyu Liu and Rui Long and Kai Shen and Liang Xiang},
year={2025},
eprint={2504.02605},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2504.02605},
}
```
## 📜 License
The dataset is licensed under CC0, subject to any intellectual property rights in the dataset owned by Bytedance. The data is adapted from the listed open source projects; your use of that data must comply with their respective licenses.
| Language | Organization/Repository | Repository Link | Data Link |
| :------- | :------------------------------ | :----------------------------------------------------------- | ------------------------------------------------------------ |
| C | facebook/zstd | [repo_link](https://github.com/facebook/zstd) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/c/facebook__zstd_dataset.jsonl) |
| C | jqlang/jq | [repo_link](https://github.com/jqlang/jq) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/c/jqlang__jq_dataset.jsonl) |
| C | ponylang/ponyc | [repo_link](https://github.com/ponylang/ponyc) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/c/ponylang__ponyc_dataset.jsonl) |
| C++ | catchorg/Catch2 | [repo_link](https://github.com/catchorg/Catch2) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/cpp/catchorg__Catch2_dataset.jsonl) |
| C++ | fmtlib/fmt | [repo_link](https://github.com/fmtlib/fmt) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/cpp/fmtlib__fmt_dataset.jsonl) |
| C++ | nlohmann/json | [repo_link](https://github.com/nlohmann/json) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/cpp/nlohmann__json_dataset.jsonl) |
| C++ | simdjson/simdjson | [repo_link](https://github.com/simdjson/simdjson) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/cpp/simdjson__simdjson_dataset.jsonl) |
| C++ | yhirose/cpp-httplib | [repo_link](https://github.com/yhirose/cpp-httplib) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/cpp/yhirose__cpp-httplib_dataset.jsonl) |
| Go | cli/cli | [repo_link](https://github.com/cli/cli) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/go/cli__cli_dataset.jsonl) |
| Go | grpc/grpc-go | [repo_link](https://github.com/grpc/grpc-go) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/go/grpc__grpc-go_dataset.jsonl) |
| Go | zeromicro/go-zero | [repo_link](https://github.com/zeromicro/go-zero) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/go/zeromicro__go-zero_dataset.jsonl) |
| Java | alibaba/fastjson2 | [repo_link](https://github.com/alibaba/fastjson2) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/java/alibaba__fastjson2_dataset.jsonl) |
| Java | elastic/logstash | [repo_link](https://github.com/elastic/logstash) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/java/elastic__logstash_dataset.jsonl) |
| Java | mockito/mockito | [repo_link](https://github.com/mockito/mockito) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/java/mockito__mockito_dataset.jsonl) |
| JS | anuraghazra/github-readme-stats | [repo_link](https://github.com/anuraghazra/github-readme-stats) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/js/anuraghazra__github-readme-stats_dataset.jsonl) |
| JS | axios/axios | [repo_link](https://github.com/axios/axios) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/js/axios__axios_dataset.jsonl) |
| JS | expressjs/express | [repo_link](https://github.com/expressjs/express) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/js/expressjs__express_dataset.jsonl) |
| JS | iamkun/dayjs | [repo_link](https://github.com/iamkun/dayjs) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/js/iamkun__dayjs_dataset.jsonl) |
| JS | Kong/insomnia | [repo_link](https://github.com/Kong/insomnia) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/js/Kong__insomnia_dataset.jsonl) |
| JS | sveltejs/svelte | [repo_link](https://github.com/sveltejs/svelte) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/js/sveltejs__svelte_dataset.jsonl) |
| Rust | BurntSushi/ripgrep | [repo_link](https://github.com/BurntSushi/ripgrep) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/rust/BurntSushi__ripgrep_dataset.jsonl) |
| Rust | clap-rs/clap | [repo_link](https://github.com/clap-rs/clap) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/rust/clap-rs__clap_dataset.jsonl) |
| Rust | nushell/nushell | [repo_link](https://github.com/nushell/nushell) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/rust/nushell__nushell_dataset.jsonl) |
| Rust | serde-rs/serde | [repo_link](https://github.com/serde-rs/serde) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/rust/serde-rs__serde_dataset.jsonl) |
| Rust | sharkdp/bat | [repo_link](https://github.com/sharkdp/bat) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/rust/sharkdp__bat_dataset.jsonl) |
| Rust | sharkdp/fd | [repo_link](https://github.com/sharkdp/fd) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/rust/sharkdp__fd_dataset.jsonl) |
| Rust | rayon-rs/rayon | [repo_link](https://github.com/rayon-rs/rayon) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/rust/rayon-rs__rayon_dataset.jsonl) |
| Rust | tokio-rs/bytes | [repo_link](https://github.com/tokio-rs/bytes) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/rust/tokio-rs__bytes_dataset.jsonl) |
| Rust | tokio-rs/tokio | [repo_link](https://github.com/tokio-rs/tokio) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/rust/tokio-rs__tokio_dataset.jsonl) |
| Rust | tokio-rs/tracing | [repo_link](https://github.com/tokio-rs/tracing) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/rust/tokio-rs__tracing_dataset.jsonl) |
| TS | darkreader/darkreader | [repo_link](https://github.com/darkreader/darkreader) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/ts/darkreader__darkreader_dataset.jsonl) |
| TS | mui/material-ui | [repo_link](https://github.com/mui/material-ui) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/ts/mui__material-ui_dataset.jsonl) |
| TS | vuejs/core | [repo_link](https://github.com/vuejs/core) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/ts/vuejs__core_dataset.jsonl) |
## 👋 概述
本仓库收录了Multi-SWE-bench数据集,该数据集出自论文《Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving》(https://huggingface.co/papers/2504.02605),旨在填补当前评估大语言模型(Large Language Model,LLM)在真实代码问题修复场景下表现的多语言基准测试集的空白。与现有以Python为核心的基准测试集(如SWE-bench)不同,本框架覆盖7种编程语言(Java、TypeScript、JavaScript、Go、Rust、C及C++),包含1632个高质量样本;这些样本从2456个候选样本中由68名专家标注人员精心筛选,以保证数据集的可靠性。排行榜可访问:https://multi-swe-bench.github.io
## ⚙️ 使用方法
bash
# 请确保已安装git-lfs(https://git-lfs.com)
git lfs install
git clone https://huggingface.co/datasets/ByteDance-Seed/Multi-SWE-bench
## 🧩 数据集样本结构
Multi-SWE-bench数据集样本的结构示例如下:
org: (字符串类型) - 来自GitHub的组织名称标识符。
repo: (字符串类型) - 来自GitHub的仓库名称标识符。
number: (整数类型) - PR编号。
state: (字符串类型) - PR状态。
title: (字符串类型) - PR标题。
body: (字符串类型) - PR正文。
base: (字典类型) - PR的目标分支信息。
resolved_issues: (列表类型) - 字符串形式的JSON列表,表示该PR所解决的问题。
fix_patch: (字符串类型) - 解决方案PR贡献的修复文件补丁。
test_patch: (字符串类型) - 解决方案PR贡献的测试文件补丁。
fixed_tests: (字典类型) - 字符串形式的JSON字典,表示PR应用后需要修复的测试用例。
p2p_tests: (字典类型) - PR应用前后均需通过的测试用例。
f2p_tests: (字典类型) - PR所解决并关联至问题修复的测试用例。
s2p_tests: (字典类型) - PR应用前应跳过、应用后需通过的测试用例。
n2p_tests: (字典类型) - PR应用前不存在、应用后需通过的测试用例。
run_result: (字典类型) - 整体运行结果,包括通过的测试用例数、失败的测试用例数等。
test_patch_result: (字典类型) - 应用测试补丁后的运行结果。
fix_patch_result: (字典类型) - 应用所有补丁后的运行结果。
instance_id: (字符串类型) - 格式化的样本标识符,通常格式为org__repo_PR-number。
## 📚 引用
@misc{zan2025multiswebench,
title={Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving},
author={Daoguang Zan and Zhirong Huang and Wei Liu and Hanwu Chen and Linhao Zhang and Shulin Xin and Lu Chen and Qi Liu and Xiaojian Zhong and Aoyan Li and Siyao Liu and Yongsheng Xiao and Liangqiang Chen and Yuyu Zhang and Jing Su and Tianyu Liu and Rui Long and Kai Shen and Liang Xiang},
year={2025},
eprint={2504.02605},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2504.02605},
}
## 📜 许可协议
本数据集采用CC0许可协议进行授权,但需遵守字节跳动(ByteDance)对数据集所拥有的任何知识产权相关规定。本数据集改编自所列开源项目,您使用该数据集时需遵守对应项目的许可协议。
| 编程语言 | 组织/仓库名称 | 仓库链接 | 数据集链接 |
| :------- | :------------------------------ | :----------------------------------------------------------- | ------------------------------------------------------------ |
| C | facebook/zstd | [repo_link](https://github.com/facebook/zstd) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/c/facebook__zstd_dataset.jsonl) |
| C | jqlang/jq | [repo_link](https://github.com/jqlang/jq) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/c/jqlang__jq_dataset.jsonl) |
| C | ponylang/ponyc | [repo_link](https://github.com/ponylang/ponyc) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/c/ponylang__ponyc_dataset.jsonl) |
| C++ | catchorg/Catch2 | [repo_link](https://github.com/catchorg/Catch2) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/cpp/catchorg__Catch2_dataset.jsonl) |
| C++ | fmtlib/fmt | [repo_link](https://github.com/fmtlib/fmt) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/cpp/fmtlib__fmt_dataset.jsonl) |
| C++ | nlohmann/json | [repo_link](https://github.com/nlohmann/json) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/cpp/nlohmann__json_dataset.jsonl) |
| C++ | simdjson/simdjson | [repo_link](https://github.com/simdjson/simdjson) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/cpp/simdjson__simdjson_dataset.jsonl) |
| C++ | yhirose/cpp-httplib | [repo_link](https://github.com/yhirose/cpp-httplib) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/cpp/yhirose__cpp-httplib_dataset.jsonl) |
| Go | cli/cli | [repo_link](https://github.com/cli/cli) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/go/cli__cli_dataset.jsonl) |
| Go | grpc/grpc-go | [repo_link](https://github.com/grpc/grpc-go) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/go/grpc__grpc-go_dataset.jsonl) |
| Go | zeromicro/go-zero | [repo_link](https://github.com/zeromicro/go-zero) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/go/zeromicro__go-zero_dataset.jsonl) |
| Java | alibaba/fastjson2 | [repo_link](https://github.com/alibaba/fastjson2) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/java/alibaba__fastjson2_dataset.jsonl) |
| Java | elastic/logstash | [repo_link](https://github.com/elastic/logstash) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/java/elastic__logstash_dataset.jsonl) |
| Java | mockito/mockito | [repo_link](https://github.com/mockito/mockito) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/java/mockito__mockito_dataset.jsonl) |
| JS | anuraghazra/github-readme-stats | [repo_link](https://github.com/anuraghazra/github-readme-stats) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/js/anuraghazra__github-readme-stats_dataset.jsonl) |
| JS | axios/axios | [repo_link](https://github.com/axios/axios) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/js/axios__axios_dataset.jsonl) |
| JS | expressjs/express | [repo_link](https://github.com/expressjs/express) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/js/expressjs__express_dataset.jsonl) |
| JS | iamkun/dayjs | [repo_link](https://github.com/iamkun/dayjs) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/js/iamkun__dayjs_dataset.jsonl) |
| JS | Kong/insomnia | [repo_link](https://github.com/Kong/insomnia) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/js/Kong__insomnia_dataset.jsonl) |
| JS | sveltejs/svelte | [repo_link](https://github.com/sveltejs/svelte) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/js/sveltejs__svelte_dataset.jsonl) |
| Rust | BurntSushi/ripgrep | [repo_link](https://github.com/BurntSushi/ripgrep) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/rust/BurntSushi__ripgrep_dataset.jsonl) |
| Rust | clap-rs/clap | [repo_link](https://github.com/clap-rs/clap) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/rust/clap-rs__clap_dataset.jsonl) |
| Rust | nushell/nushell | [repo_link](https://github.com/nushell/nushell) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/rust/nushell__nushell_dataset.jsonl) |
| Rust | serde-rs/serde | [repo_link](https://github.com/serde-rs/serde) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/rust/serde-rs__serde_dataset.jsonl) |
| Rust | sharkdp/bat | [repo_link](https://github.com/sharkdp/bat) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/rust/sharkdp__bat_dataset.jsonl) |
| Rust | sharkdp/fd | [repo_link](https://github.com/sharkdp/fd) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/rust/sharkdp__fd_dataset.jsonl) |
| Rust | rayon-rs/rayon | [repo_link](https://github.com/rayon-rs/rayon) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/rust/rayon-rs__rayon_dataset.jsonl) |
| Rust | tokio-rs/bytes | [repo_link](https://github.com/tokio-rs/bytes) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/rust/tokio-rs__bytes_dataset.jsonl) |
| Rust | tokio-rs/tokio | [repo_link](https://github.com/tokio-rs/tokio) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/rust/tokio-rs__tokio_dataset.jsonl) |
| Rust | tokio-rs/tracing | [repo_link](https://github.com/tokio-rs/tracing) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/rust/tokio-rs__tracing_dataset.jsonl) |
| TS | darkreader/darkreader | [repo_link](https://github.com/darkreader/darkreader) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/ts/darkreader__darkreader_dataset.jsonl) |
| TS | mui/material-ui | [repo_link](https://github.com/mui/material-ui) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/ts/mui__material-ui_dataset.jsonl) |
| TS | vuejs/core | [repo_link](https://github.com/vuejs/core) | [data_link](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench/blob/main/ts/vuejs__core_dataset.jsonl) |
提供机构:
maas
创建时间:
2025-04-18



