five

zanderjiang/deepseek-v3.2-SWE-Agent

收藏
Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/zanderjiang/deepseek-v3.2-SWE-Agent
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation tags: - code - swe-bench - agent - deepseek - execution-trace pretty_name: DeepSeek V3.2 SWE-Agent Execution Traces size_categories: - n<1K --- # DeepSeek V3.2 SWE-Agent Execution Traces Full execution traces of **DeepSeek V3.2** running **SWE-Agent** on **SWE-Bench**. ## Dataset Structure Each JSON file in `traces/` corresponds to one SWE-Bench problem instance. The filename is the instance ID (e.g., `django__django-12345.json`). ### Trace Schema ```json { "instance_id": "django__django-12345", "model": "DeepSeek-V3.2", "agent": "SWE-agent", "total_steps": 15, "total_run_duration_seconds": 120.5, "exit_status": "submitted", "submission": "<git diff patch>", "model_stats": { "instance_cost": 0.0, "tokens_sent": 50000, "tokens_received": 5000, "api_calls": 15 }, "steps": [ { "step_index": 0, "timestamp": 1711500000.0, "model_input": [ {"role": "system", "content": "..."}, {"role": "user", "content": "..."} ], "model_output": { "raw_response": "full text response from the model", "thought": "extracted reasoning/thought", "action": "bash command or tool call", "thinking_blocks": [], "tool_calls": [], "tool_call_ids": [] }, "tool_execution": { "command": "find /repo -name '*.py' | head -20", "start_timestamp": 1711500001.0, "duration_seconds": 0.25, "output": "file1.py\nfile2.py\n...", "execution_time_reported": 0.25 }, "exit_status": null, "done": false, "submission": null } ] } ``` ### Fields | Field | Description | |-------|-------------| | `instance_id` | SWE-Bench problem ID | | `model` | Model name (DeepSeek-V3.2) | | `total_steps` | Number of agent steps taken | | `total_run_duration_seconds` | Wall-clock time for the full run | | `exit_status` | How the run ended (submitted, exit_cost, exit_context, etc.) | | `submission` | The git diff patch submitted as the solution | | `model_stats` | Aggregated token/cost/call statistics | | `steps[].model_input` | Full message history sent to the LLM at this step | | `steps[].model_output.raw_response` | Complete model response text | | `steps[].model_output.thought` | Parsed reasoning/thought from the response | | `steps[].model_output.action` | Parsed action/command from the response | | `steps[].model_output.thinking_blocks` | Extended thinking blocks (if any) | | `steps[].model_output.tool_calls` | Function calling tool calls (if any) | | `steps[].tool_execution.command` | The command executed in the environment | | `steps[].tool_execution.duration_seconds` | Wall-clock time for tool execution | | `steps[].tool_execution.output` | Tool/command output (observation) | ## Generation Details - **Model**: DeepSeek V3.2 served locally via sglang (8x B200 GPUs, TP=8, DP=8) - **Agent**: SWE-Agent v1.1.0 with function calling - **Benchmark**: SWE-Bench (lite/verified/full) - **Config**: `config/deepseek_v3.2_swebench.yaml` in SWE-Agent repo
提供机构:
zanderjiang
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作