zanderjiang/deepseek-v3.2-SWE-Agent
收藏Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/zanderjiang/deepseek-v3.2-SWE-Agent
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
tags:
- code
- swe-bench
- agent
- deepseek
- execution-trace
pretty_name: DeepSeek V3.2 SWE-Agent Execution Traces
size_categories:
- n<1K
---
# DeepSeek V3.2 SWE-Agent Execution Traces
Full execution traces of **DeepSeek V3.2** running **SWE-Agent** on **SWE-Bench**.
## Dataset Structure
Each JSON file in `traces/` corresponds to one SWE-Bench problem instance. The filename is the instance ID (e.g., `django__django-12345.json`).
### Trace Schema
```json
{
"instance_id": "django__django-12345",
"model": "DeepSeek-V3.2",
"agent": "SWE-agent",
"total_steps": 15,
"total_run_duration_seconds": 120.5,
"exit_status": "submitted",
"submission": "<git diff patch>",
"model_stats": {
"instance_cost": 0.0,
"tokens_sent": 50000,
"tokens_received": 5000,
"api_calls": 15
},
"steps": [
{
"step_index": 0,
"timestamp": 1711500000.0,
"model_input": [
{"role": "system", "content": "..."},
{"role": "user", "content": "..."}
],
"model_output": {
"raw_response": "full text response from the model",
"thought": "extracted reasoning/thought",
"action": "bash command or tool call",
"thinking_blocks": [],
"tool_calls": [],
"tool_call_ids": []
},
"tool_execution": {
"command": "find /repo -name '*.py' | head -20",
"start_timestamp": 1711500001.0,
"duration_seconds": 0.25,
"output": "file1.py\nfile2.py\n...",
"execution_time_reported": 0.25
},
"exit_status": null,
"done": false,
"submission": null
}
]
}
```
### Fields
| Field | Description |
|-------|-------------|
| `instance_id` | SWE-Bench problem ID |
| `model` | Model name (DeepSeek-V3.2) |
| `total_steps` | Number of agent steps taken |
| `total_run_duration_seconds` | Wall-clock time for the full run |
| `exit_status` | How the run ended (submitted, exit_cost, exit_context, etc.) |
| `submission` | The git diff patch submitted as the solution |
| `model_stats` | Aggregated token/cost/call statistics |
| `steps[].model_input` | Full message history sent to the LLM at this step |
| `steps[].model_output.raw_response` | Complete model response text |
| `steps[].model_output.thought` | Parsed reasoning/thought from the response |
| `steps[].model_output.action` | Parsed action/command from the response |
| `steps[].model_output.thinking_blocks` | Extended thinking blocks (if any) |
| `steps[].model_output.tool_calls` | Function calling tool calls (if any) |
| `steps[].tool_execution.command` | The command executed in the environment |
| `steps[].tool_execution.duration_seconds` | Wall-clock time for tool execution |
| `steps[].tool_execution.output` | Tool/command output (observation) |
## Generation Details
- **Model**: DeepSeek V3.2 served locally via sglang (8x B200 GPUs, TP=8, DP=8)
- **Agent**: SWE-Agent v1.1.0 with function calling
- **Benchmark**: SWE-Bench (lite/verified/full)
- **Config**: `config/deepseek_v3.2_swebench.yaml` in SWE-Agent repo
提供机构:
zanderjiang



