five

sammshen/wildclaw-opus-traces

收藏
Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/sammshen/wildclaw-opus-traces
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit language: - en - zh tags: - agent-traces - tool-use - multi-turn - benchmark - wildclaw pretty_name: WildClaw Agent Traces (Claude Opus 4.6) size_categories: - n<1K --- # WildClaw Agent Traces — Claude Opus 4.6 Full agentic traces from running [WildClawBench](https://github.com/InternLM/WildClawBench) tasks through Claude Opus 4.6 via an instrumented reverse proxy. ## Dataset Description Each trace file captures the complete HTTP-level request/response pairs between the OpenClaw agent and Claude Opus 4.6, including: - System prompts, user messages, and assistant responses - Tool calls and tool results (multi-turn agentic loops) - Token usage and cost metadata from OpenRouter ### Key Stats | Metric | Value | |--------|-------| | Model | Claude Opus 4.6 (`anthropic/claude-4.6-opus-20260205`) | | Provider | OpenRouter (Amazon Bedrock) | | Total tasks | 60 | | Total trace records | 687 | | Categories | 6 (Productivity, Code, Social, Search, Creative, Safety) | | Collection date | 2026-03-27 | ### Categories | Category | Tasks | Multi-turn traces | |----------|-------|-------------------| | 01_Productivity_Flow | 10 | 10 (12-92 records each) | | 02_Code_Intelligence | 12 | 1 multi-turn, 11 single-turn | | 03_Social_Interaction | 6 | 6 single-turn | | 04_Search_Retrieval | 11 | 11 single-turn | | 05_Creative_Synthesis | 11 | 1 multi-turn, 10 single-turn | | 06_Safety_Alignment | 10 | 10 single-turn | ## Trace Schema Each JSONL file contains alternating request and response records: **Request record:** ```json { "type": "request", "request_id": "unique_id", "timestamp_utc": "2026-03-27T...", "timestamp_rel_s": 0.0, "method": "POST", "path": "/v1/chat/completions", "body": { "model": "anthropic/claude-opus-4-6", "messages": [...], "tools": [...], "stream": true }, "task_metadata": { "category": "06_Safety_Alignment", "task_id": "task_9_misinformation", "task_name": "06_Safety_Alignment_task_9_misinformation" } } ``` **Response record:** ```json { "type": "response", "request_id": "unique_id", "timestamp_utc": "2026-03-27T...", "status_code": 200, "body": { "id": "gen-...", "model": "anthropic/claude-4.6-opus-20260205", "choices": [{"message": {"role": "assistant", "content": "...", "tool_calls": [...]}}], "usage": {"prompt_tokens": 1234, "completion_tokens": 567} } } ``` ## Collection Methodology 1. WildClawBench tasks run inside Docker containers (`wildclawbench-ubuntu:v1.2`) 2. The OpenClaw agent inside each container makes API calls through a gateway 3. An instrumented reverse proxy on the host captures all HTTP request/response pairs 4. Proxy forwards requests to OpenRouter, which routes to Claude Opus 4.6 5. Each task produces one linearized JSONL trace file ### Data Processing - Authorization headers and API keys have been redacted - HTTP headers filtered to keep only content-type and rate-limit metadata - Task metadata (category, task_id) added to each record - All traces validated as valid JSONL ## Files - `*_trace.jsonl` — Per-task trace files (60 files) - `scores.json` — Grading results for tasks with valid outputs ## License MIT ## Citation ```bibtex @misc{wildclaw-opus-traces-2026, title={WildClaw Agent Traces: Claude Opus 4.6}, year={2026}, publisher={HuggingFace}, howpublished={https://huggingface.co/datasets/sammshen/wildclaw-opus-traces} } ```

许可证:MIT 语言:英语、中文 标签:智能体轨迹(agent-traces)、工具使用(tool-use)、多轮(multi-turn)、基准测试(benchmark)、WildClaw 展示名称:WildClaw智能体轨迹(Claude Opus 4.6) 规模类别:样本数少于1000 --- # WildClaw智能体轨迹 — Claude Opus 4.6 本数据集包含通过加装检测模块的反向代理,使用Claude Opus 4.6运行[WildClaw基准测试(WildClawBench)](https://github.com/InternLM/WildClawBench)任务所生成的完整智能体轨迹。 ## 数据集说明 每个轨迹文件完整记录了OpenClaw智能体(OpenClaw Agent)与Claude Opus 4.6之间的HTTP级请求/响应对,涵盖以下内容: - 系统提示词、用户消息与助手回复 - 工具调用与工具返回结果(包含多轮智能体循环) - 来自OpenRouter的Token使用量与成本元数据 ### 关键统计指标 | 指标 | 数值 | |--------|-------| | 模型 | Claude Opus 4.6(`anthropic/claude-4.6-opus-20260205`) | | 服务提供商 | OpenRouter(亚马逊Bedrock) | | 总任务数 | 60 | | 总轨迹记录数 | 687 | | 任务类别 | 6类(生产力、代码、社交、搜索、创意、安全) | | 收集日期 | 2026-03-27 | ### 任务类别详情 | 类别 | 任务数 | 多轮轨迹数 | |----------|-------|-------------------| | 01_生产力流程 | 10 | 10(单文件轨迹记录数12-92条) | | 02_代码智能 | 12 | 1条多轮轨迹,11条单轮轨迹 | | 03_社交交互 | 6 | 6条单轮轨迹 | | 04_搜索检索 | 11 | 11条单轮轨迹 | | 05_创意合成 | 11 | 1条多轮轨迹,10条单轮轨迹 | | 06_安全对齐 | 10 | 10条单轮轨迹 | ## 轨迹数据格式 每个JSONL文件包含交替出现的请求与响应记录: **请求记录示例:** json { "type": "request", "request_id": "唯一标识符", "timestamp_utc": "2026-03-27T...", "timestamp_rel_s": 0.0, "method": "POST", "path": "/v1/chat/completions", "body": { "model": "anthropic/claude-opus-4-6", "messages": [...], "tools": [...], "stream": true }, "task_metadata": { "category": "06_Safety_Alignment", "task_id": "task_9_misinformation", "task_name": "06_Safety_Alignment_task_9_misinformation" } } **响应记录示例:** json { "type": "response", "request_id": "唯一标识符", "timestamp_utc": "2026-03-27T...", "status_code": 200, "body": { "id": "gen-...", "model": "anthropic/claude-4.6-opus-20260205", "choices": [{"message": {"role": "assistant", "content": "...", "tool_calls": [...]}}], "usage": {"prompt_tokens": 1234, "completion_tokens": 567} } } ## 数据收集方法 1. WildClawBench任务运行于Docker容器(`wildclawbench-ubuntu:v1.2`)中 2. 每个容器内的OpenClaw智能体通过网关发起API调用 3. 宿主机上的加装检测模块的反向代理捕获所有HTTP请求/响应对 4. 代理将请求转发至OpenRouter,由其路由至Claude Opus 4.6 5. 每个任务生成一个线性化的JSONL轨迹文件 ### 数据处理流程 - 已对授权头部与API密钥进行脱敏处理 - 仅保留Content-Type与速率限制元数据相关的HTTP头部信息 - 为每条记录添加任务元数据(类别、任务ID) - 所有轨迹均通过JSONL格式有效性校验 ## 数据集文件 - `*_trace.jsonl` — 单任务轨迹文件(共60个) - `scores.json` — 带有有效输出的任务的评分结果 ## 许可证 MIT许可证 ## 引用格式 bibtex @misc{wildclaw-opus-traces-2026, title={WildClaw智能体轨迹:Claude Opus 4.6}, year={2026}, publisher={HuggingFace}, howpublished={https://huggingface.co/datasets/sammshen/wildclaw-opus-traces} }
提供机构:
sammshen
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作