sammshen/wildclaw-opus-traces
收藏Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/sammshen/wildclaw-opus-traces
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- en
- zh
tags:
- agent-traces
- tool-use
- multi-turn
- benchmark
- wildclaw
pretty_name: WildClaw Agent Traces (Claude Opus 4.6)
size_categories:
- n<1K
---
# WildClaw Agent Traces — Claude Opus 4.6
Full agentic traces from running [WildClawBench](https://github.com/InternLM/WildClawBench) tasks through Claude Opus 4.6 via an instrumented reverse proxy.
## Dataset Description
Each trace file captures the complete HTTP-level request/response pairs between the OpenClaw agent and Claude Opus 4.6, including:
- System prompts, user messages, and assistant responses
- Tool calls and tool results (multi-turn agentic loops)
- Token usage and cost metadata from OpenRouter
### Key Stats
| Metric | Value |
|--------|-------|
| Model | Claude Opus 4.6 (`anthropic/claude-4.6-opus-20260205`) |
| Provider | OpenRouter (Amazon Bedrock) |
| Total tasks | 60 |
| Total trace records | 687 |
| Categories | 6 (Productivity, Code, Social, Search, Creative, Safety) |
| Collection date | 2026-03-27 |
### Categories
| Category | Tasks | Multi-turn traces |
|----------|-------|-------------------|
| 01_Productivity_Flow | 10 | 10 (12-92 records each) |
| 02_Code_Intelligence | 12 | 1 multi-turn, 11 single-turn |
| 03_Social_Interaction | 6 | 6 single-turn |
| 04_Search_Retrieval | 11 | 11 single-turn |
| 05_Creative_Synthesis | 11 | 1 multi-turn, 10 single-turn |
| 06_Safety_Alignment | 10 | 10 single-turn |
## Trace Schema
Each JSONL file contains alternating request and response records:
**Request record:**
```json
{
"type": "request",
"request_id": "unique_id",
"timestamp_utc": "2026-03-27T...",
"timestamp_rel_s": 0.0,
"method": "POST",
"path": "/v1/chat/completions",
"body": {
"model": "anthropic/claude-opus-4-6",
"messages": [...],
"tools": [...],
"stream": true
},
"task_metadata": {
"category": "06_Safety_Alignment",
"task_id": "task_9_misinformation",
"task_name": "06_Safety_Alignment_task_9_misinformation"
}
}
```
**Response record:**
```json
{
"type": "response",
"request_id": "unique_id",
"timestamp_utc": "2026-03-27T...",
"status_code": 200,
"body": {
"id": "gen-...",
"model": "anthropic/claude-4.6-opus-20260205",
"choices": [{"message": {"role": "assistant", "content": "...", "tool_calls": [...]}}],
"usage": {"prompt_tokens": 1234, "completion_tokens": 567}
}
}
```
## Collection Methodology
1. WildClawBench tasks run inside Docker containers (`wildclawbench-ubuntu:v1.2`)
2. The OpenClaw agent inside each container makes API calls through a gateway
3. An instrumented reverse proxy on the host captures all HTTP request/response pairs
4. Proxy forwards requests to OpenRouter, which routes to Claude Opus 4.6
5. Each task produces one linearized JSONL trace file
### Data Processing
- Authorization headers and API keys have been redacted
- HTTP headers filtered to keep only content-type and rate-limit metadata
- Task metadata (category, task_id) added to each record
- All traces validated as valid JSONL
## Files
- `*_trace.jsonl` — Per-task trace files (60 files)
- `scores.json` — Grading results for tasks with valid outputs
## License
MIT
## Citation
```bibtex
@misc{wildclaw-opus-traces-2026,
title={WildClaw Agent Traces: Claude Opus 4.6},
year={2026},
publisher={HuggingFace},
howpublished={https://huggingface.co/datasets/sammshen/wildclaw-opus-traces}
}
```
许可证:MIT
语言:英语、中文
标签:智能体轨迹(agent-traces)、工具使用(tool-use)、多轮(multi-turn)、基准测试(benchmark)、WildClaw
展示名称:WildClaw智能体轨迹(Claude Opus 4.6)
规模类别:样本数少于1000
---
# WildClaw智能体轨迹 — Claude Opus 4.6
本数据集包含通过加装检测模块的反向代理,使用Claude Opus 4.6运行[WildClaw基准测试(WildClawBench)](https://github.com/InternLM/WildClawBench)任务所生成的完整智能体轨迹。
## 数据集说明
每个轨迹文件完整记录了OpenClaw智能体(OpenClaw Agent)与Claude Opus 4.6之间的HTTP级请求/响应对,涵盖以下内容:
- 系统提示词、用户消息与助手回复
- 工具调用与工具返回结果(包含多轮智能体循环)
- 来自OpenRouter的Token使用量与成本元数据
### 关键统计指标
| 指标 | 数值 |
|--------|-------|
| 模型 | Claude Opus 4.6(`anthropic/claude-4.6-opus-20260205`) |
| 服务提供商 | OpenRouter(亚马逊Bedrock) |
| 总任务数 | 60 |
| 总轨迹记录数 | 687 |
| 任务类别 | 6类(生产力、代码、社交、搜索、创意、安全) |
| 收集日期 | 2026-03-27 |
### 任务类别详情
| 类别 | 任务数 | 多轮轨迹数 |
|----------|-------|-------------------|
| 01_生产力流程 | 10 | 10(单文件轨迹记录数12-92条) |
| 02_代码智能 | 12 | 1条多轮轨迹,11条单轮轨迹 |
| 03_社交交互 | 6 | 6条单轮轨迹 |
| 04_搜索检索 | 11 | 11条单轮轨迹 |
| 05_创意合成 | 11 | 1条多轮轨迹,10条单轮轨迹 |
| 06_安全对齐 | 10 | 10条单轮轨迹 |
## 轨迹数据格式
每个JSONL文件包含交替出现的请求与响应记录:
**请求记录示例:**
json
{
"type": "request",
"request_id": "唯一标识符",
"timestamp_utc": "2026-03-27T...",
"timestamp_rel_s": 0.0,
"method": "POST",
"path": "/v1/chat/completions",
"body": {
"model": "anthropic/claude-opus-4-6",
"messages": [...],
"tools": [...],
"stream": true
},
"task_metadata": {
"category": "06_Safety_Alignment",
"task_id": "task_9_misinformation",
"task_name": "06_Safety_Alignment_task_9_misinformation"
}
}
**响应记录示例:**
json
{
"type": "response",
"request_id": "唯一标识符",
"timestamp_utc": "2026-03-27T...",
"status_code": 200,
"body": {
"id": "gen-...",
"model": "anthropic/claude-4.6-opus-20260205",
"choices": [{"message": {"role": "assistant", "content": "...", "tool_calls": [...]}}],
"usage": {"prompt_tokens": 1234, "completion_tokens": 567}
}
}
## 数据收集方法
1. WildClawBench任务运行于Docker容器(`wildclawbench-ubuntu:v1.2`)中
2. 每个容器内的OpenClaw智能体通过网关发起API调用
3. 宿主机上的加装检测模块的反向代理捕获所有HTTP请求/响应对
4. 代理将请求转发至OpenRouter,由其路由至Claude Opus 4.6
5. 每个任务生成一个线性化的JSONL轨迹文件
### 数据处理流程
- 已对授权头部与API密钥进行脱敏处理
- 仅保留Content-Type与速率限制元数据相关的HTTP头部信息
- 为每条记录添加任务元数据(类别、任务ID)
- 所有轨迹均通过JSONL格式有效性校验
## 数据集文件
- `*_trace.jsonl` — 单任务轨迹文件(共60个)
- `scores.json` — 带有有效输出的任务的评分结果
## 许可证
MIT许可证
## 引用格式
bibtex
@misc{wildclaw-opus-traces-2026,
title={WildClaw智能体轨迹:Claude Opus 4.6},
year={2026},
publisher={HuggingFace},
howpublished={https://huggingface.co/datasets/sammshen/wildclaw-opus-traces}
}
提供机构:
sammshen



