SWE-agent-trajectories
收藏魔搭社区2026-05-23 更新2024-12-28 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/SWE-agent-trajectories
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Summary
This dataset contains 80,036 trajectories generated by a software engineering agent based on the SWE-agent framework, using various models as action generators. In these trajectories, the agent attempts to solve GitHub issues from the [nebius/SWE-bench-extra](https://huggingface.co/datasets/nebius/SWE-bench-extra) and the dev split of [princeton-nlp/SWE-bench](https://huggingface.co/datasets/princeton-nlp/SWE-bench).
# Dataset Description
This dataset was created as part of a research project focused on developing a software engineering agent using open-weight models, which achieved a score of 40.6% on the SWE-bench Verified benchmark. The detailed process of achieving this is outlined in our blog post ["Leveraging training and search for better software engineering agents"](https://nebius.com/blog/posts/training-and-search-for-software-engineering-agents).
The dataset collection consisted of two stages: collecting issue-resolution instances, following a methodology similar to SWE-bench, and generating a large number of trajectories for solving the collected issues. The generated code patches in these trajectories were evaluated by the tests from the linked pull requests to determine which of them passed the tests. The detailed process of collecting issue-resolution instances is described in our blog post ["Scaling Data Collection for Training Software Engineering Agents"](https://nebius.com/blog/posts/scaling-data-collection-for-training-swe-agents).
# How to use
```python
from datasets import load_dataset
ds = load_dataset('nebius/swe-agent-trajectories')
```
# Dataset Statistics
Key statistics on issue resolution and patch edits, including steps, context length, edit details, exit rates, and correctness metrics, aggregated across 80,036 instances.
| | | Issue Resolved | Issue Not Resolved |
|-------------------------------|-------------------------------------|----------------|--------------------|
| **Trajectory** | Average steps count | 31.3 | 58.4 |
| | Average context length (Llama 3 tokenizer) | 8,352.4 | 15,241 |
| **Final Patch** | Files edited | 1.33 | 2.17 |
| | Lines edited | 20.7 | 61.0 |
| **Exit Status Rate (Grouped by Target)** | Submits | 94.6% | 57.6% |
| | Exit context | 5.31% | 30.4% |
| | Other | 0.37% | 12% |
| **Correct Steps** | At least one correct file opened | 83% | 40% |
| **Total** | Instances | 13,389 | 66,647 |
# Dataset Structure
An agent's trajectory includes the following information:
| Field Name | Type | Description |
|------------------|------|-------------------------------------------------------------------------------------------------------|
| instance_id | str | The identifier of the instance with the issue the agent tried to solve, consisting of the repository name and issue number. |
| model_name | str | The name of the model used to generate the trajectory. |
| target | bool | Whether the model solved the issue in this trajectory. |
| trajectory | str | A JSON list with the logged trajectory consisting of a list of model reasoning and actions (under the role: ai) and observations from the environment (under the role: user). The first entry is the system prompt under the system role. |
| exit_status | str | The status of the agent's completion. |
| generated_patch | str | The final patch generated by the model while modifying the project files. |
| eval_logs | str | The logs of test execution to verify the final patch. |
# License
The dataset is licensed under the Creative Commons Attribution 4.0 license. However, please respect the license of each specific repository on which a particular instance is based. To facilitate this, the license of each repository at the time of the commit is provided for every instance in [SWE-bench-extra](https://huggingface.co/datasets/nebius/SWE-bench-extra).
Additionally, a notice for users: if you intend to use the outputs of these models, you must comply with the [Llama 3.1 License](https://www.llama.com/llama3_1/license/).
# 数据集概述
本数据集包含80036条由基于SWE-agent框架的软件工程智能体(software engineering agent)生成的轨迹,该智能体以多种模型作为动作生成器。在这些轨迹中,智能体尝试解决来自[nebius/SWE-bench-extra](https://huggingface.co/datasets/nebius/SWE-bench-extra)以及[princeton-nlp/SWE-bench](https://huggingface.co/datasets/princeton-nlp/SWE-bench)开发拆分集的GitHub议题。
# 数据集说明
本数据集是一项聚焦于使用开源权重模型开发软件工程智能体的研究项目的成果,该智能体在SWE-bench Verified基准测试中取得了40.6%的得分。对应的详细实现过程可参阅我们的博客文章《利用训练与搜索优化软件工程智能体》(https://nebius.com/blog/posts/training-and-search-for-software-engineering-agents)。
数据集采集分为两个阶段:首先采集议题解决实例,采集方法与SWE-bench类似;随后为采集到的议题生成大量求解轨迹。我们通过关联拉取请求(Pull Request)中的测试用例对轨迹中生成的代码补丁进行评估,以判定哪些补丁可通过测试。议题解决实例的详细采集流程可参阅我们的博客文章《扩大软件工程智能体训练的数据采集规模》(https://nebius.com/blog/posts/scaling-data-collection-for-training-swe-agents)。
# 使用方法
python
from datasets import load_dataset
ds = load_dataset('nebius/swe-agent-trajectories')
# 数据集统计信息
本统计涵盖80036条实例的议题解决与补丁编辑相关关键指标,包括步骤数、上下文长度、编辑详情、退出率与正确性指标等。
| | | 议题已解决 | 议题未解决 |
|-------------------------------|-------------------------------------|------------|------------|
| **轨迹** | 平均步骤数 | 31.3 | 58.4 |
| | 平均上下文长度(Llama 3分词器) | 8,352.4 | 15,241 |
| **最终补丁** | 编辑文件数 | 1.33 | 2.17 |
| | 编辑行数 | 20.7 | 61.0 |
| **按目标分组的退出状态率** | 提交操作 | 94.6% | 57.6% |
| | 退出上下文 | 5.31% | 30.4% |
| | 其他 | 0.37% | 12% |
| **正确步骤** | 至少打开一个正确文件 | 83% | 40% |
| **总计** | 实例数 | 13,389 | 66,647 |
# 数据集结构
智能体轨迹包含以下字段信息:
| 字段名 | 类型 | 描述 |
|------------------|------|-------------------------------------------------------------------------------------------------------|
| instance_id | 字符串 | 待解决议题的实例标识符,由仓库名称与议题编号组成。 |
| model_name | 字符串 | 用于生成该轨迹的模型名称。 |
| target | 布尔值 | 该轨迹对应的模型是否成功解决了议题。 |
| trajectory | 字符串 | 包含日志轨迹的JSON列表,由模型推理与动作(角色为ai)、环境观测信息(角色为user)组成的列表。列表首项为系统角色(system)对应的系统提示词。 |
| exit_status | 字符串 | 智能体任务完成的状态。 |
| generated_patch | 字符串 | 模型在修改项目文件时生成的最终补丁。 |
| eval_logs | 字符串 | 用于验证最终补丁的测试执行日志。 |
# 授权协议
本数据集采用知识共享署名4.0(Creative Commons Attribution 4.0)许可协议进行授权。但请同时尊重每个实例所基于的特定仓库的授权协议。为便于用户遵守相关要求,[SWE-bench-extra](https://huggingface.co/datasets/nebius/SWE-bench-extra)中的每个实例均提供了对应提交时该仓库的授权协议信息。
此外,特此向用户提示:若您计划使用这些模型的输出产物,则必须遵守[Llama 3.1许可协议](https://www.llama.com/llama3_1/license/)。
提供机构:
maas
创建时间:
2024-12-22
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集包含80,036条由软件工程代理生成的轨迹,用于解决GitHub问题,是研究开发基于开放权重模型的软件工程代理的一部分。数据集详细记录了每条轨迹的模型名称、解决状态、生成的补丁和测试日志等信息。
以上内容由遇见数据集搜集并总结生成



