sammshen/wildclaw-opus-traces

Name: sammshen/wildclaw-opus-traces
Creator: sammshen
Published: 2026-03-27 22:23:07
License: 暂无描述

Hugging Face2026-03-27 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/sammshen/wildclaw-opus-traces

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit language: - en - zh tags: - agent-traces - tool-use - multi-turn - benchmark - wildclaw pretty_name: WildClaw Agent Traces (Claude Opus 4.6) size_categories: - n<1K --- # WildClaw Agent Traces — Claude Opus 4.6 Full agentic traces from running [WildClawBench](https://github.com/InternLM/WildClawBench) tasks through Claude Opus 4.6 via an instrumented reverse proxy. ## Dataset Description Each trace file captures the complete HTTP-level request/response pairs between the OpenClaw agent and Claude Opus 4.6, including: - System prompts, user messages, and assistant responses - Tool calls and tool results (multi-turn agentic loops) - Token usage and cost metadata from OpenRouter ### Key Stats | Metric | Value | |--------|-------| | Model | Claude Opus 4.6 (`anthropic/claude-4.6-opus-20260205`) | | Provider | OpenRouter (Amazon Bedrock) | | Total tasks | 60 | | Total trace records | 687 | | Categories | 6 (Productivity, Code, Social, Search, Creative, Safety) | | Collection date | 2026-03-27 | ### Categories | Category | Tasks | Multi-turn traces | |----------|-------|-------------------| | 01_Productivity_Flow | 10 | 10 (12-92 records each) | | 02_Code_Intelligence | 12 | 1 multi-turn, 11 single-turn | | 03_Social_Interaction | 6 | 6 single-turn | | 04_Search_Retrieval | 11 | 11 single-turn | | 05_Creative_Synthesis | 11 | 1 multi-turn, 10 single-turn | | 06_Safety_Alignment | 10 | 10 single-turn | ## Trace Schema Each JSONL file contains alternating request and response records: **Request record:** ```json { "type": "request", "request_id": "unique_id", "timestamp_utc": "2026-03-27T...", "timestamp_rel_s": 0.0, "method": "POST", "path": "/v1/chat/completions", "body": { "model": "anthropic/claude-opus-4-6", "messages": [...], "tools": [...], "stream": true }, "task_metadata": { "category": "06_Safety_Alignment", "task_id": "task_9_misinformation", "task_name": "06_Safety_Alignment_task_9_misinformation" } } ``` **Response record:** ```json { "type": "response", "request_id": "unique_id", "timestamp_utc": "2026-03-27T...", "status_code": 200, "body": { "id": "gen-...", "model": "anthropic/claude-4.6-opus-20260205", "choices": [{"message": {"role": "assistant", "content": "...", "tool_calls": [...]}}], "usage": {"prompt_tokens": 1234, "completion_tokens": 567} } } ``` ## Collection Methodology 1. WildClawBench tasks run inside Docker containers (`wildclawbench-ubuntu:v1.2`) 2. The OpenClaw agent inside each container makes API calls through a gateway 3. An instrumented reverse proxy on the host captures all HTTP request/response pairs 4. Proxy forwards requests to OpenRouter, which routes to Claude Opus 4.6 5. Each task produces one linearized JSONL trace file ### Data Processing - Authorization headers and API keys have been redacted - HTTP headers filtered to keep only content-type and rate-limit metadata - Task metadata (category, task_id) added to each record - All traces validated as valid JSONL ## Files - `*_trace.jsonl` — Per-task trace files (60 files) - `scores.json` — Grading results for tasks with valid outputs ## License MIT ## Citation ```bibtex @misc{wildclaw-opus-traces-2026, title={WildClaw Agent Traces: Claude Opus 4.6}, year={2026}, publisher={HuggingFace}, howpublished={https://huggingface.co/datasets/sammshen/wildclaw-opus-traces} } ```

许可证：MIT 语言：英语、中文标签：智能体轨迹（agent-traces）、工具使用（tool-use）、多轮（multi-turn）、基准测试（benchmark）、WildClaw 展示名称：WildClaw智能体轨迹（Claude Opus 4.6）规模类别：样本数少于1000 --- # WildClaw智能体轨迹 — Claude Opus 4.6 本数据集包含通过加装检测模块的反向代理，使用Claude Opus 4.6运行[WildClaw基准测试（WildClawBench）](https://github.com/InternLM/WildClawBench)任务所生成的完整智能体轨迹。 ## 数据集说明每个轨迹文件完整记录了OpenClaw智能体（OpenClaw Agent）与Claude Opus 4.6之间的HTTP级请求/响应对，涵盖以下内容： - 系统提示词、用户消息与助手回复 - 工具调用与工具返回结果（包含多轮智能体循环） - 来自OpenRouter的Token使用量与成本元数据 ### 关键统计指标 | 指标 | 数值 | |--------|-------| | 模型 | Claude Opus 4.6（`anthropic/claude-4.6-opus-20260205`） | | 服务提供商 | OpenRouter（亚马逊Bedrock） | | 总任务数 | 60 | | 总轨迹记录数 | 687 | | 任务类别 | 6类（生产力、代码、社交、搜索、创意、安全） | | 收集日期 | 2026-03-27 | ### 任务类别详情 | 类别 | 任务数 | 多轮轨迹数 | |----------|-------|-------------------| | 01_生产力流程 | 10 | 10（单文件轨迹记录数12-92条） | | 02_代码智能 | 12 | 1条多轮轨迹，11条单轮轨迹 | | 03_社交交互 | 6 | 6条单轮轨迹 | | 04_搜索检索 | 11 | 11条单轮轨迹 | | 05_创意合成 | 11 | 1条多轮轨迹，10条单轮轨迹 | | 06_安全对齐 | 10 | 10条单轮轨迹 | ## 轨迹数据格式每个JSONL文件包含交替出现的请求与响应记录： **请求记录示例：** json { "type": "request", "request_id": "唯一标识符", "timestamp_utc": "2026-03-27T...", "timestamp_rel_s": 0.0, "method": "POST", "path": "/v1/chat/completions", "body": { "model": "anthropic/claude-opus-4-6", "messages": [...], "tools": [...], "stream": true }, "task_metadata": { "category": "06_Safety_Alignment", "task_id": "task_9_misinformation", "task_name": "06_Safety_Alignment_task_9_misinformation" } } **响应记录示例：** json { "type": "response", "request_id": "唯一标识符", "timestamp_utc": "2026-03-27T...", "status_code": 200, "body": { "id": "gen-...", "model": "anthropic/claude-4.6-opus-20260205", "choices": [{"message": {"role": "assistant", "content": "...", "tool_calls": [...]}}], "usage": {"prompt_tokens": 1234, "completion_tokens": 567} } } ## 数据收集方法 1. WildClawBench任务运行于Docker容器（`wildclawbench-ubuntu:v1.2`）中 2. 每个容器内的OpenClaw智能体通过网关发起API调用 3. 宿主机上的加装检测模块的反向代理捕获所有HTTP请求/响应对 4. 代理将请求转发至OpenRouter，由其路由至Claude Opus 4.6 5. 每个任务生成一个线性化的JSONL轨迹文件 ### 数据处理流程 - 已对授权头部与API密钥进行脱敏处理 - 仅保留Content-Type与速率限制元数据相关的HTTP头部信息 - 为每条记录添加任务元数据（类别、任务ID） - 所有轨迹均通过JSONL格式有效性校验 ## 数据集文件 - `*_trace.jsonl` — 单任务轨迹文件（共60个） - `scores.json` — 带有有效输出的任务的评分结果 ## 许可证 MIT许可证 ## 引用格式 bibtex @misc{wildclaw-opus-traces-2026, title={WildClaw智能体轨迹：Claude Opus 4.6}, year={2026}, publisher={HuggingFace}, howpublished={https://huggingface.co/datasets/sammshen/wildclaw-opus-traces} }

提供机构：

sammshen

5,000+

优质数据集

54 个

任务类型

进入经典数据集