DCAgent2/swebench_verified_Qwen3_Coder_30B_A3B_Instruct_20260427_232200-traces

Name: DCAgent2/swebench_verified_Qwen3_Coder_30B_A3B_Instruct_20260427_232200-traces
Creator: DCAgent2
Published: 2026-04-30 08:00:06
License: 暂无描述

Hugging Face2026-04-30 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/DCAgent2/swebench_verified_Qwen3_Coder_30B_A3B_Instruct_20260427_232200-traces

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: conversations list: - name: content dtype: string - name: role dtype: string - name: agent dtype: string - name: model dtype: string - name: model_provider dtype: string - name: date dtype: string - name: task dtype: string - name: episode dtype: string - name: run_id dtype: string - name: trial_name dtype: string - name: tool_definitions list: - name: function struct: - name: description dtype: string - name: name dtype: string - name: parameters struct: - name: additionalProperties dtype: bool - name: properties struct: - name: code struct: - name: description dtype: string - name: type dtype: string - name: command struct: - name: description dtype: string - name: enum list: string - name: type dtype: string - name: file_text struct: - name: description dtype: string - name: type dtype: string - name: insert_line struct: - name: description dtype: string - name: type dtype: string - name: is_input struct: - name: description dtype: string - name: enum list: string - name: type dtype: string - name: message struct: - name: description dtype: string - name: type dtype: string - name: new_str struct: - name: description dtype: string - name: type dtype: string - name: old_str struct: - name: description dtype: string - name: type dtype: string - name: path struct: - name: description dtype: string - name: type dtype: string - name: security_risk struct: - name: description dtype: string - name: enum list: string - name: type dtype: string - name: task_list struct: - name: description dtype: string - name: items struct: - name: additionalProperties dtype: bool - name: properties struct: - name: id struct: - name: description dtype: string - name: type dtype: string - name: notes struct: - name: description dtype: string - name: type dtype: string - name: status struct: - name: description dtype: string - name: enum list: string - name: type dtype: string - name: title struct: - name: description dtype: string - name: type dtype: string - name: required list: string - name: type dtype: string - name: type dtype: string - name: thought struct: - name: description dtype: string - name: type dtype: string - name: timeout struct: - name: description dtype: string - name: type dtype: string - name: view_range struct: - name: description dtype: string - name: items struct: - name: type dtype: string - name: type dtype: string - name: required list: string - name: type dtype: string - name: type dtype: string - name: result dtype: string - name: verifier_output dtype: string splits: - name: train num_bytes: 563511112 num_examples: 1474 download_size: 501710964 dataset_size: 563511112 configs: - config_name: default data_files: - split: train path: data/train-* ---

提供机构：

DCAgent2

搜集汇总

数据集介绍

构建方式

该数据集源自SWE-bench验证集，通过调用Qwen3-Coder-30B-A3B-Instruct模型在2026年4月27日的特定时间点对软件工程任务进行推理，捕获了模型与工具交互的完整对话轨迹。每条数据均包含多轮`conversations`序列，按照`role`（角色）与`content`（内容）的结构化字段组织，同时记录了模型所使用的`tool_definitions`（工具定义），其中详细描述了函数名称、参数类型、枚举值及必填字段等信息，确保对话与工具调用逻辑的可追溯性。此外，数据还附带了`result`（最终结果）与`verifier_output`（验证器输出），为后续分析模型行为提供了闭合的反馈回路。

使用方法

该数据集适用于微调或评估代码生成与工具调用型语言模型。使用者可通过解析`conversations`字段中的`role`与`content`对来还原完整的交互历史，并利用`tool_definitions`中的结构化参数验证模型对函数调用的规范程度。推荐将其作为多轮推理基准，借助`result`与`verifier_output`字段自动判定模型输出是否正确。加载时，可使用Hugging Face的`load_dataset`函数指定`train`分片，并依据`config_name='default'`配置读取数据文件，其中`data/train-*`路径下的分片文件可直接迭代处理。

背景与挑战

背景概述

该数据集诞生于大语言模型与软件工程交叉融合的前沿领域，由研究团队基于先进的Qwen3_Coder_30B_A3B_Instruct模型于2025年4月构建而成。其核心研究问题聚焦于评估和增强代码生成模型在复杂软件工程任务（如SWE-bench基准）中的自主代理能力。通过记录模型在多轮交互中执行代码编辑、文件操作、命令调用等工具时的完整对话轨迹与验证结果，该数据集为分析智能体决策过程、理解模型在真实开发场景中的行为模式提供了珍贵的实证素材。作为连接代码理解与自动编程的桥梁，该数据集对推动自主软件开发系统的鲁棒性评估与模型优化具有显著影响力。

当前挑战

该数据集致力于攻克的关键领域问题在于，如何量化并提升大语言模型在端到端软件工程任务中的自主问题解决能力——即从理解复杂仓库结构、定位缺陷到生成并验证修复补丁的全链路性能，这远超出传统代码补全或简单问答任务的难度。在构建过程中，挑战主要体现在两方面：一是需要设计精细的工具调用架构以忠实记录模型在沙箱环境中的每一步操作（如文件读写、命令执行），确保回溯分析的完备性；二是面对长达数千行的交互日志，如何从异构的、含有多重依赖的对话流中抽取出可泛化的训练信号，避免模型过拟合于特定执行路径，同时维持数据质量与规模的平衡。

常用场景

经典使用场景

在软件工程与人工智能的交叉领域中，该数据集专注于记录代码生成智能体在复杂软件开发任务中的交互轨迹，其核心应用场景在于训练和评估大型语言模型在真实代码仓库环境下的修复与实现能力。数据集包含agent—环境之间的多轮会话、工具调用定义（如文件编辑、命令执行）、任务描述及最终结果，为研究如何构建能够自主理解代码库、定位缺陷并生成修正方案的智能体提供了宝贵的训练和测试基准。

解决学术问题

该数据集的构建解决了学术界在评估代码生成模型时长期面临的缺乏真实、可验证环境的问题。通过记录包含完整交互和工具调用细节的轨迹，研究人员能够深入分析模型在复杂任务中的推理过程、决策逻辑和错误模式。它推动了对智能体规划能力、代码编辑策略以及多步操作稳定性等核心学术议题的探索，为从单纯生成代码片段进化到解决完整软件工程任务提供了关键的检验平台，深刻影响了代码智能体研究的评估范式。

实际应用

在实际应用中，该数据集所支撑的智能体技术能够被集成到现代软件开发工作流中，充当自动化的代码审查助手、缺陷修复机器人或重构工具。例如，在持续集成/持续部署（CI/CD）流水线中，基于此类数据训练的智能体可以自动响应失败的测试用例，分析根因并尝试提交修正补丁，显著缩短开发者排查和解决问题的时间，提升软件迭代速度与代码质量。

数据集最近研究