princeton-nlp/SWE-bench_bm25_13K

Name: princeton-nlp/SWE-bench_bm25_13K
Creator: princeton-nlp
Published: 2024-04-15 22:10:52
License: 暂无描述

Hugging Face2024-04-15 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/princeton-nlp/SWE-bench_bm25_13K

下载链接

链接失效反馈

官方服务：

资源简介：

--- configs: - config_name: default data_files: - split: train path: data/train-* - split: dev path: data/dev-* - split: test path: data/test-* - split: validation path: data/validation-* dataset_info: features: - name: instance_id dtype: string - name: text dtype: string - name: repo dtype: string - name: base_commit dtype: string - name: problem_statement dtype: string - name: hints_text dtype: string - name: created_at dtype: string - name: patch dtype: string - name: test_patch dtype: string - name: version dtype: string - name: FAIL_TO_PASS dtype: string - name: PASS_TO_PASS dtype: string - name: environment_setup_commit dtype: string splits: - name: train num_bytes: 1537849718 num_examples: 18817 - name: dev num_bytes: 15941600 num_examples: 225 - name: test num_bytes: 156543048 num_examples: 2294 - name: validation num_bytes: 16292656 num_examples: 191 download_size: 744411715 dataset_size: 1726627022 --- # Dataset Card for "SWE-bench_bm25_13K" ### Dataset Summary SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of [SWE-bench: Can Language Models Resolve Real-World GitHub Issues?](https://arxiv.org/abs/2310.06770) This dataset `SWE-bench_bm25_13K` includes a formatting of each instance using Pyserini's BM25 retrieval as described in the paper. The code context size limit is 13,000 `cl100k_base` tokens from the [`tiktoken`](https://github.com/openai/tiktoken) tokenization package used for OpenAI models. The `text` column can be used directly with LMs to generate patch files. Models are instructed to generate [`patch`](https://en.wikipedia.org/wiki/Patch_(Unix)) formatted file using the following template: ```diff <patch> diff --- a/path/to/file.py --- b/path/to/file.py @@ -1,3 +1,3 @@ This is a test file. -It contains several lines. +It has been modified. This is the third line. </patch> ``` This format can be used directly with the [SWE-bench inference scripts](https://github.com/princeton-nlp/SWE-bench/tree/main/inference). Please refer to these scripts for more details on inference. ### Supported Tasks and Leaderboards SWE-bench proposes a new task: issue resolution provided a full repository and GitHub issue. The leaderboard can be found at www.swebench.com ### Languages The text of the dataset is primarily English, but we make no effort to filter or otherwise clean based on language type. ## Dataset Structure ### Data Instances An example of a SWE-bench datum is as follows: ``` instance_id: (str) - A formatted instance identifier, usually as repo_owner__repo_name-PR-number. text: (str) - The input text including instructions, the "Oracle" retrieved file, and an example of the patch format for output. patch: (str) - The gold patch, the patch generated by the PR (minus test-related code), that resolved the issue. repo: (str) - The repository owner/name identifier from GitHub. base_commit: (str) - The commit hash of the repository representing the HEAD of the repository before the solution PR is applied. hints_text: (str) - Comments made on the issue prior to the creation of the solution PR’s first commit creation date. created_at: (str) - The creation date of the pull request. test_patch: (str) - A test-file patch that was contributed by the solution PR. problem_statement: (str) - The issue title and body. version: (str) - Installation version to use for running evaluation. environment_setup_commit: (str) - commit hash to use for environment setup and installation. FAIL_TO_PASS: (str) - A json list of strings that represent the set of tests resolved by the PR and tied to the issue resolution. PASS_TO_PASS: (str) - A json list of strings that represent tests that should pass before and after the PR application. ``` [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

提供机构：

princeton-nlp

原始信息汇总

数据集卡片 "SWE-bench_bm25_13K"

数据集概述

SWE-bench 是一个测试系统自动解决 GitHub 问题的能力的数据集。该数据集收集了来自 12 个流行 Python 项目的 2,294 个 Issue-Pull Request 对。评估通过使用 PR 后的行为作为参考解决方案的单元测试验证进行。

该数据集作为 SWE-bench: Can Language Models Resolve Real-World GitHub Issues? 的一部分发布。

数据集 SWE-bench_bm25_13K 包括使用 Pyserini 的 BM25 检索格式化的每个实例，如论文中所述。代码上下文大小限制为 13,000 个 cl100k_base 令牌，使用 tiktoken 分词包用于 OpenAI 模型。text 列可以直接用于生成补丁文件的语言模型。

模型被指示使用以下模板生成 patch 格式的文件： diff <patch> diff --- a/path/to/file.py --- b/path/to/file.py @@ -1,3 +1,3 @@ This is a test file. -It contains several lines. +It has been modified. This is the third line. </patch>

此格式可以直接与 SWE-bench 推理脚本一起使用。有关推理的更多详细信息，请参阅这些脚本。

支持的任务和排行榜

SWE-bench 提出了一项新任务：提供完整的仓库和 GitHub 问题进行问题解决。排行榜可以在 www.swebench.com 找到。

语言

数据集的文本主要是英语，但我们没有根据语言类型进行过滤或其他清理。

数据集结构

数据实例

SWE-bench 数据的一个示例如下：

instance_id: (str) - 格式化的实例标识符，通常为 repo_owner__repo_name-PR-number。 text: (str) - 包括指令、“Oracle”检索文件和输出补丁格式的示例的输入文本。 patch: (str) - 由 PR 生成的黄金补丁（减去与测试相关的代码），解决了问题。 repo: (str) - GitHub 仓库的 owner/name 标识符。 base_commit: (str) - 表示解决方案 PR 应用之前仓库 HEAD 的提交哈希。 hints_text: (str) - 在解决方案 PR 的第一次提交创建日期之前在问题上发表的评论。 created_at: (str) - 拉取请求的创建日期。 test_patch: (str) - 解决方案 PR 贡献的测试文件补丁。 problem_statement: (str) - 问题标题和正文。 version: (str) - 用于运行评估的安装版本。 environment_setup_commit: (str) - 用于环境设置和安装的提交哈希。 FAIL_TO_PASS: (str) - 表示 PR 解决并与问题解决相关的一组测试的 json 字符串列表。 PASS_TO_PASS: (str) - 表示在 PR 应用前后应通过的测试的 json 字符串列表。

5,000+

优质数据集

54 个

任务类型

进入经典数据集