nebius/SWE-rebench-V2-PRs
收藏Hugging Face2026-03-03 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/nebius/SWE-rebench-V2-PRs
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- text-generation
language:
- en
tags:
- code
- software-engineering
- swe-bench
- pull-requests
- nebius
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
dataset_info:
features:
- name: base_commit
dtype: string
- name: created_at
dtype: string
- name: hints_text
dtype: string
- name: instance_id
dtype: string
- name: patch
dtype: string
- name: pr_description
dtype: string
- name: problem_statement
dtype: string
- name: pull_number
dtype: int64
- name: repo
dtype: string
- name: test_patch
dtype: string
- name: FAIL_TO_PASS
sequence: string
- name: PASS_TO_PASS
sequence: string
- name: interface
dtype: string
- name: license
dtype: string
- name: install_config
struct:
- name: base_image_name
dtype: string
- name: install
sequence: string
- name: log_parser
dtype: string
- name: test_cmd
dtype: string
- name: meta
struct:
- name: num_modified_files
dtype: int64
- name: num_modified_lines
dtype: int64
- name: pr_author
dtype: string
- name: pr_labels
sequence: string
- name: llm_metadata
struct:
- name: code
dtype: string
- name: code_quality
dtype: string
- name: confidence
dtype: float64
- name: detected_issues
struct:
- name: B1
dtype: bool
- name: B2
dtype: bool
- name: B3
dtype: bool
- name: B4
dtype: bool
- name: B5
dtype: bool
- name: B6
dtype: bool
- name: detected_issues_explanation
dtype: string
- name: detecte d_issues
dtype: string
- name: difficulty
dtype: string
- name: external_urls
sequence: string
- name: intent_completeness
dtype: string
- name: patch
dtype: string
- name: pr_categories
sequence: string
- name: reason
dtype: string
- name: reasoning
dtype: string
- name: suggested_fixes
sequence: string
- name: test_alignment
sequence: string
- name: test_alignment_issues
sequence: string
- name: test_alignment_quick_tree
sequence: string
- name: test_alignment_quick_tree_bootstrap
sequence: string
- name: test_alignment_quick_tree_mocks
sequence: string
- name: test_alignment_quick_tree_params
sequence: string
- name: test_alignment_quick_tree_unrelated
sequence: string
- name: test_alignment_quick_tree_use_hook
sequence: string
- name: test_alignment_quick_tree_use_hook_unrelated
sequence: string
- name: test_alignment_sample_without_replacement
sequence: string
- name: test_alignment_test_alignment_sample_without_replacement
sequence: string
- name: test_build_phylogeny
sequence: string
- name: test_build_phylogeny_unrelated
sequence: string
- name: test_build_phylogeny_use_hook
sequence: string
- name: test_build_phylogeny_use_hook_unrelated
sequence: string
- name: test_core_seq_test_sample_motif_length_1
sequence: string
- name: test_core_seq_test_sample_motif_length_3
sequence: string
- name: test_core_seq_test_sample_without_replacement
sequence: string
- name: test_core_sequence
sequence: string
- name: test_core_sequence_test_sample_motif_length_1
sequence: string
- name: test_core_sequence_test_sample_motif_length_3
sequence: string
- name: test_core_sequence_test_sample_without_replacement
sequence: string
- name: test_sample_motif_length_1
sequence: string
- name: test_sample_motif_length_3
sequence: string
- name: test_sample_without_replacement
sequence: string
splits:
- name: train
num_bytes: 14180938050
num_examples: 126300
download_size: 2686298152
dataset_size: 14180938050
---
# SWE-rebench-V2-PRs
## Dataset Summary
SWE-rebench-V2-PRs is a large-scale dataset of real-world GitHub pull requests collected across multiple programming languages, intended for training and evaluating code-generation and software-engineering agents. The dataset contains 126,300 samples covering Go, Python, JavaScript, TypeScript, Rust, Java, C, C++, Julia, Elixir, Kotlin, PHP, Scala, Clojure, Dart, OCaml, and other languages.
For log parser functions, base Dockerfiles, and the prompts used, please see https://github.com/SWE-rebench/SWE-rebench-V2
The detailed technical report is available at [“SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale”](https://arxiv.org/abs/2602.23866).
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("nebius/SWE-rebench-V2-PRs", split="train")
print(len(ds)) # 126300
```
## Dataset Structure
| Field | Type | Description |
|---|---|---|
| `instance_id` | `string` | Unique identifier for the instance |
| `repo` | `string` | GitHub repository in `owner/repo` format |
| `pull_number` | `int64` | Pull request number |
| `base_commit` | `string` | Git commit SHA of the base before the PR |
| `patch` | `string` | The gold patch introduced by the pull request |
| `test_patch` | `string` | Diff adding or modifying tests that verify the patch |
| `problem_statement` | `string` | Issue description the pull request addresses |
| `pr_description` | `string` | Full pull request description |
| `hints_text` | `string` | Additional hints extracted from the issue thread |
| `created_at` | `int64` | Unix timestamp (milliseconds) of PR creation |
| `FAIL_TO_PASS` | `list[string]` | Test IDs that fail before the patch and pass after |
| `PASS_TO_PASS` | `list[string]` | Test IDs that pass both before and after the patch |
| `interface` | `string` | Description of the code interface changed by the PR |
| `license` | `string` | SPDX license identifier of the repository |
| `install_config` | `struct` | Configuration needed to reproduce the test environment |
| `meta` | `struct` | Metadata and LLM-generated quality annotations |
# License
The dataset is licensed under the Creative Commons Attribution 4.0 license. However, please respect the license of each specific repository on which a particular instance is based. To facilitate this, the license of each repository at the time of the commit is provided for every instance.
# Citation
```bibtex
@misc{badertdinov2026swerebenchv2languageagnosticswe,
title={SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale},
author={Ibragim Badertdinov and Maksim Nekrashevich and Anton Shevtsov and Alexander Golubev},
year={2026},
eprint={2602.23866},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2602.23866},
}
提供机构:
nebius



