five

nebius/SWE-rebench-V2-PRs

收藏
Hugging Face2026-03-03 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/nebius/SWE-rebench-V2-PRs
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - text-generation language: - en tags: - code - software-engineering - swe-bench - pull-requests - nebius configs: - config_name: default data_files: - split: train path: data/train-* dataset_info: features: - name: base_commit dtype: string - name: created_at dtype: string - name: hints_text dtype: string - name: instance_id dtype: string - name: patch dtype: string - name: pr_description dtype: string - name: problem_statement dtype: string - name: pull_number dtype: int64 - name: repo dtype: string - name: test_patch dtype: string - name: FAIL_TO_PASS sequence: string - name: PASS_TO_PASS sequence: string - name: interface dtype: string - name: license dtype: string - name: install_config struct: - name: base_image_name dtype: string - name: install sequence: string - name: log_parser dtype: string - name: test_cmd dtype: string - name: meta struct: - name: num_modified_files dtype: int64 - name: num_modified_lines dtype: int64 - name: pr_author dtype: string - name: pr_labels sequence: string - name: llm_metadata struct: - name: code dtype: string - name: code_quality dtype: string - name: confidence dtype: float64 - name: detected_issues struct: - name: B1 dtype: bool - name: B2 dtype: bool - name: B3 dtype: bool - name: B4 dtype: bool - name: B5 dtype: bool - name: B6 dtype: bool - name: detected_issues_explanation dtype: string - name: detecte d_issues dtype: string - name: difficulty dtype: string - name: external_urls sequence: string - name: intent_completeness dtype: string - name: patch dtype: string - name: pr_categories sequence: string - name: reason dtype: string - name: reasoning dtype: string - name: suggested_fixes sequence: string - name: test_alignment sequence: string - name: test_alignment_issues sequence: string - name: test_alignment_quick_tree sequence: string - name: test_alignment_quick_tree_bootstrap sequence: string - name: test_alignment_quick_tree_mocks sequence: string - name: test_alignment_quick_tree_params sequence: string - name: test_alignment_quick_tree_unrelated sequence: string - name: test_alignment_quick_tree_use_hook sequence: string - name: test_alignment_quick_tree_use_hook_unrelated sequence: string - name: test_alignment_sample_without_replacement sequence: string - name: test_alignment_test_alignment_sample_without_replacement sequence: string - name: test_build_phylogeny sequence: string - name: test_build_phylogeny_unrelated sequence: string - name: test_build_phylogeny_use_hook sequence: string - name: test_build_phylogeny_use_hook_unrelated sequence: string - name: test_core_seq_test_sample_motif_length_1 sequence: string - name: test_core_seq_test_sample_motif_length_3 sequence: string - name: test_core_seq_test_sample_without_replacement sequence: string - name: test_core_sequence sequence: string - name: test_core_sequence_test_sample_motif_length_1 sequence: string - name: test_core_sequence_test_sample_motif_length_3 sequence: string - name: test_core_sequence_test_sample_without_replacement sequence: string - name: test_sample_motif_length_1 sequence: string - name: test_sample_motif_length_3 sequence: string - name: test_sample_without_replacement sequence: string splits: - name: train num_bytes: 14180938050 num_examples: 126300 download_size: 2686298152 dataset_size: 14180938050 --- # SWE-rebench-V2-PRs ## Dataset Summary SWE-rebench-V2-PRs is a large-scale dataset of real-world GitHub pull requests collected across multiple programming languages, intended for training and evaluating code-generation and software-engineering agents. The dataset contains 126,300 samples covering Go, Python, JavaScript, TypeScript, Rust, Java, C, C++, Julia, Elixir, Kotlin, PHP, Scala, Clojure, Dart, OCaml, and other languages. For log parser functions, base Dockerfiles, and the prompts used, please see https://github.com/SWE-rebench/SWE-rebench-V2 The detailed technical report is available at [“SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale”](https://arxiv.org/abs/2602.23866). ## Quick Start ```python from datasets import load_dataset ds = load_dataset("nebius/SWE-rebench-V2-PRs", split="train") print(len(ds)) # 126300 ``` ## Dataset Structure | Field | Type | Description | |---|---|---| | `instance_id` | `string` | Unique identifier for the instance | | `repo` | `string` | GitHub repository in `owner/repo` format | | `pull_number` | `int64` | Pull request number | | `base_commit` | `string` | Git commit SHA of the base before the PR | | `patch` | `string` | The gold patch introduced by the pull request | | `test_patch` | `string` | Diff adding or modifying tests that verify the patch | | `problem_statement` | `string` | Issue description the pull request addresses | | `pr_description` | `string` | Full pull request description | | `hints_text` | `string` | Additional hints extracted from the issue thread | | `created_at` | `int64` | Unix timestamp (milliseconds) of PR creation | | `FAIL_TO_PASS` | `list[string]` | Test IDs that fail before the patch and pass after | | `PASS_TO_PASS` | `list[string]` | Test IDs that pass both before and after the patch | | `interface` | `string` | Description of the code interface changed by the PR | | `license` | `string` | SPDX license identifier of the repository | | `install_config` | `struct` | Configuration needed to reproduce the test environment | | `meta` | `struct` | Metadata and LLM-generated quality annotations | # License The dataset is licensed under the Creative Commons Attribution 4.0 license. However, please respect the license of each specific repository on which a particular instance is based. To facilitate this, the license of each repository at the time of the commit is provided for every instance. # Citation ```bibtex @misc{badertdinov2026swerebenchv2languageagnosticswe, title={SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale}, author={Ibragim Badertdinov and Maksim Nekrashevich and Anton Shevtsov and Alexander Golubev}, year={2026}, eprint={2602.23866}, archivePrefix={arXiv}, primaryClass={cs.SE}, url={https://arxiv.org/abs/2602.23866}, }
提供机构:
nebius
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作