five

obaydata/openclaw-trajectories-demo

收藏
Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/obaydata/openclaw-trajectories-demo
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 language: - en - zh tags: - agentic-trajectories - tool-use - multi-turn - chain-of-thought - reasoning - openclaw - agent-training - sft - rlhf size_categories: - n<1K task_categories: - text-generation - question-answering pretty_name: "OpenClaw Trajectories Demo" dataset_info: - config_name: qa_overview features: - name: query_id dtype: string - name: user_query dtype: string - name: final_answer dtype: string - name: n_rounds dtype: int32 - config_name: trajectories_meta features: - name: query_id dtype: string - name: session_id dtype: string - name: collected_at dtype: string - name: agent_id dtype: string - name: model dtype: string - name: user_query dtype: string - name: final_answer dtype: string - name: n_rounds dtype: int32 - name: n_steps dtype: int32 - name: system_prompt_len dtype: int32 - name: tools_count dtype: int32 - name: has_tool_calls dtype: bool configs: - config_name: qa_overview data_files: "qa_overview.jsonl" - config_name: trajectories_meta data_files: "trajectories_meta.jsonl" - config_name: full_trajectories data_files: "*/trajectory.json" default: true --- # OpenClaw Trajectories Demo > **Rubrics-based review** and **human verification** are available as add-on services on client request. Contact us for details. 8 sample execution trajectories collected from [OpenClaw](https://docs.openclaw.ai). Each sample is a complete real-time conversation snapshot of an OpenClaw agent, capturing the model's input (system prompt + tool definitions + user query + history), output (reasoning content + final answer + tool calls), and the agent's state (workspace files + memory knowledge base). --- ## I. Dataset Overview | Item | Value | |---|---| | Sample count | 8 | | Model | `moonshot/kimi-k2.5` | | Language | English (system prompt + query + answer) | | Persona count | 4 (`coder-py` / `finance-cn` / `writer-tech` / `researcher`) | | Samples per persona | 2 | | Average system prompt length | ~17,500 characters | | Tools schema count | 18 / sample | | Disk footprint per sample | ~200 KB (including the sqlite memory index) | Each sample is a **directory** (not a single file) containing several structured files. One sample corresponds to **one complete "user asks → agent thinks → agent calls tools → agent answers"** lifecycle. --- ## II. Persona Design The dataset showcases 4 AI assistant personas from different domains. Each persona has its own independent workspace and memory: | Persona ID | Role | Key traits | |---|---|---| | `coder-py` | Senior Python engineer | 10+ years of experience; pragmatic and precise; insists on type hints, unit tests, minimal diffs | | `finance-cn` | Equity research analyst | US / HK / A-share specialist; strictly separates objective data from opinions; always cites the basis when quoting numbers | | `writer-tech` | Technical writer | Specializes in release notes / runbooks / API docs; active voice, structured, zero marketing speak | | `researcher` | Research assistant | Must cite sources; cross-checks against two independent sources before drawing conclusions | Each persona is described in its **workspace** through 8 `.md` files (role, personality, tool preferences, default user persona, etc.) and holds 3 domain-knowledge `.md` files in its **memory** (background guidelines, historical incidents, glossary, etc.). These files are injected into the system prompt at runtime, or loaded on demand by the `read` tool. --- ## III. Directory Structure ``` openclaw_trajectories_demo/ ├── readme.md ├── coder-py-q01/ # one sample = one directory │ ├── trajectory.json # full execution trajectory (the core training file) │ ├── qa.json # compact query + answer (for fast browsing / indexing) │ ├── workspace/ # workspace snapshot at the moment the agent enters this query │ │ ├── AGENTS.md # role description / working style / tool conventions │ │ ├── IDENTITY.md # identity / role definition │ │ ├── SOUL.md # personality / tone / values │ │ ├── USER.md # default user persona │ │ ├── TOOLS.md # tool usage notes │ │ ├── MEMORY.md # memory file index (tells the agent what is loadable) │ │ ├── BOOTSTRAP.md # bootstrap notes │ │ └── HEARTBEAT.md # heartbeat / status notes │ └── memory/ # long-term memory snapshot (loaded on demand via `read`) │ ├── <slot>.sqlite # OpenClaw built-in FTS index database │ ├── python_style.md # domain-knowledge file 1 │ ├── past_incidents.md # domain-knowledge file 2 │ └── preferred_libs.md # domain-knowledge file 3 │ ├── coder-py-q02/ # ...same directory layout ├── finance-cn-q01/ ├── finance-cn-q02/ ├── writer-tech-q01/ ├── writer-tech-q02/ ├── researcher-q01/ └── researcher-q02/ ``` The 4 components inside each sample directory: | Path | Type | Purpose | |---|---|---| | `trajectory.json` | JSON file | Main training payload — contains all model I/O | | `qa.json` | JSON file (~250 B) | Just the query + final answer; useful for quick grep / stats | | `workspace/` | directory (8 `.md`) | Snapshot of the agent's workspace at session start | | `memory/` | directory (1 sqlite + N md) | Snapshot of the agent's long-term memory | --- ## IV. `trajectory.json` Field Reference Each `trajectory.json` is a JSON object with the following top-level fields: | Field | Type | Description | |---|---|---| | `schema_version` | string | Data format version (`openclaw-traj-v1`) | | `session_id` | string | OpenClaw internal session UUID | | `query_id` | string | Unique sample ID; matches the directory name | | `collected_at` | ISO 8601 string | Collection timestamp (UTC) | | `agent` | object | Agent metadata: `id` / `model` / `thinking_config` / `workspace_source` | | `workspace` | object | Workspace reference: `dir` (relative path) + `files` (list of file names) | | `memory` | object | Memory reference: `dir` + `copied` (file list) + `tables` (sqlite table list) | | `system_prompt` | string | **Full system prompt** (~17,500 characters; comes directly from LLM request `messages[0]`) | | `tools_schema` | array[object] | **Full JSON Schema definitions for all 18 tools** (function / parameters / required) | | `user_query` | string | Original user input | | `steps` | array[object] | **Step-by-step execution trace** (the central field, see below) | | `final_answer` | string | Final answer (= the `content` of the last agent step) | | `n_rounds` | int | Number of LLM call rounds (= tool call count + 1) | ### `steps` array structure Each step represents one event in the conversation. There are 3 possible `source` values: #### 1. `source: "user"` — user message ```json { "step_id": 1, "source": "user", "content": "Write a Python function that takes a list of integers and returns the second-largest unique value..." } ``` #### 2. `source: "agent"` — model response ```json { "step_id": 2, "source": "agent", "reasoning_content": "The user wants... I should first check...", "content": "```python\ndef second_max_unique(...):\n ...\n```", "tool_calls": [ { "id": "exec0", "type": "function", "name": "exec", "arguments": {"command": "python second_largest.py"}, "arguments_raw": "{\"command\":\"python second_largest.py\"}" } ] } ``` | Field | Description | |---|---| | `reasoning_content` | The model's internal chain of thought (Kimi's native `reasoning_content` field; equivalent to OpenAI / Claude thinking blocks) | | `content` | The text the model surfaces to the user | | `tool_calls` | Tools the model invoked this round (can be an empty array) | | `tool_calls[].id` | Tool call ID; correlated with the next `tool_result` step | | `tool_calls[].name` | Tool name (matches a `function.name` in `tools_schema`) | | `tool_calls[].arguments` | Parsed JSON arguments object | | `tool_calls[].arguments_raw` | Original JSON string (for training that needs the literal form) | #### 3. `source: "tool_result"` — tool execution result ```json { "step_id": 3, "source": "tool_result", "tool_call_id": "exec0", "content": "All tests passed.\n" } ``` | Field | Description | |---|---| | `tool_call_id` | Matches a `tool_calls[].id` from the previous agent step | | `content` | Tool output (stdout / stderr / return value) | ### Step ordering rules - The first step is always `source: "user"` - The middle steps strictly alternate between `agent` and `tool_result` (agent invokes a tool → tool_result returns the output → agent continues) - The last step is always `source: "agent"`, and its `content` equals `final_answer` --- ## V. Field-to-Training-Objective Mapping | Training paradigm | Fields used | |---|---| | **SFT (supervised fine-tuning)** | `system_prompt` + `tools_schema` + `user_query` + (intermediate `tool_calls` + `tool_result`) → `content` as the label | | **CoT / Reasoning SFT** | Same as above, but `reasoning_content` is also used as a label | | **DPO / RLHF** | A trajectory becomes a chosen / rejected candidate (extra scoring required) | | **Tool-use training** | `tools_schema` (as model input) + `tool_calls` (as label) | | **Multi-turn agent SFT** | The full `steps` array unrolled into (input, output) pairs | --- ## VI. Memory Files The `.md` files inside `memory/` are the **domain background knowledge** the agent holds at the moment of this query. OpenClaw exposes these to the agent in two ways: 1. **Injected into the system prompt**: The `MEMORY.md` file at the workspace root automatically lists all available memory file paths. The system prompt already contains this index. 2. **Loaded at runtime**: The agent calls `read` on a specific path (e.g. `read("memory/python_style.md")`). `<slot>.sqlite` is OpenClaw's built-in FTS index database (10 tables). In the default FTS-only mode the index is empty, but the schema is intact and downstream consumers can re-index it as needed. Memory files per persona: | Persona | Memory files | |---|---| | `coder-py` | `python_style.md` (code conventions) / `past_incidents.md` (postmortems) / `preferred_libs.md` (recommended libs) | | `finance-cn` | `estimation_methods.md` (valuation methods) / `key_metrics.md` (financial metrics) / `watch_list.md` (sector watch list) | | `writer-tech` | `style_guide.md` (writing style guide) / `doc_templates.md` (doc templates) / `common_mistakes.md` (common mistakes) | | `researcher` | `citation_format.md` (citation rules) / `preferred_sources.md` (trusted sources) / `research_method.md` (research methodology) | --- ## VII. Workspace Files The 8 `.md` files in `workspace/` are **automatically loaded by OpenClaw at runtime and concatenated into the system prompt**. They play a role similar to `.cursorrules` / `AGENTS.md` configuration in coding agents like Cursor / Continue: | File | Contents | |---|---| | `AGENTS.md` | Role description + workflow + tool usage conventions (the main body, ~7-8 KB) | | `IDENTITY.md` | Identity / role definition (~600 B) | | `SOUL.md` | Personality / tone / values (~1.5 KB) | | `USER.md` | Default user persona (~500 B) | | `TOOLS.md` | Tool usage notes (~850 B) | | `MEMORY.md` | Memory file index (tells the agent what is loadable) | | `BOOTSTRAP.md` | Bootstrap notes (auto-generated by OpenClaw) | | `HEARTBEAT.md` | Heartbeat / status notes (auto-generated by OpenClaw) | The full system prompt that ends up in the LLM call is **already saved** in `trajectory.json.system_prompt`. Consumers do **not** need to re-concatenate manually — just use that field directly. The `workspace/` directory exists primarily for: - **Reproducibility**: Anyone can rebuild the system prompt from these source files - **Interpretability**: A human reviewer can quickly see "why this agent behaves this way" - **Customization**: To fork a new persona, simply edit these files --- ## VIII. Single-Sample Statistics Example Take `coder-py-q01` as an example: | Item | Value | |---|---| | User query | "Write a Python function that takes a list of integers and returns the second-largest unique value..." | | Final answer length | 733 characters (full Python code + tests + complexity analysis) | | n_rounds | 1 (no tool calls; answered directly) | | Steps count | 2 (1 user + 1 agent) | | system_prompt length | 17,549 characters | | tools_schema count | 18 (read / write / edit / exec / web_search / web_fetch / memory_search / ...) | | reasoning_content length | ~500 characters (Kimi's internal reasoning) | | Workspace files | 8 | | Memory files | 4 (1 sqlite + 3 md) | | Total directory size | ~200 KB | `finance-cn-q01` showcases the other typical case — with tool calls: | Item | Value | |---|---| | n_rounds | 4 (3 tool calls) | | Steps count | 8 (1 user + 4 agent + 3 tool_result) | | Tool usage | `memory_search` → `read("memory/estimation_methods.md")` → `read("memory/key_metrics.md")` → final synthesis | This sample fully demonstrates a typical multi-round workflow where the agent actively loads memory files to support its answer. --- ## IX. Collection Methodology The data is captured from real OpenClaw runtime LLM API calls via an HTTP interceptor — **it is not synthesized or post-edited**. The collection pipeline: 1. Run a local HTTP proxy 2. Configure OpenClaw to point its LLM API base URL at the proxy 3. The proxy fully records every request (system_prompt + tools_schema + messages) and response (content + reasoning_content + tool_calls) 4. For each user query, run `openclaw agent --local` once; the proxy captures all LLM I/O for that run 5. Post-processing: combine the proxy logs + workspace files + memory files into the directory structure described above **Guarantees**: - `system_prompt` / `tools_schema` / `reasoning_content` come directly from real LLM API I/O, **without any editing** - `workspace/` and `memory/` are actual snapshots of the state at query start time - Workspaces are strictly isolated between queries (automatically restored after each run); samples never pollute each other --- ## X. Field Completeness Guarantee | Field | Satisfied by 8 / 8 samples | |---|---| | `system_prompt` (≥ 1000 chars) | ✓ | | `user_query` (non-empty) | ✓ | | `tools_schema` (≥ 5 tools, with full JSON Schema) | ✓ | | `steps` (at least 2 steps, first = user, last = agent) | ✓ | | `final_answer` (non-empty) | ✓ | | `agent` metadata | ✓ | | `workspace/` (all 8 `.md` files present) | ✓ | | `memory/` (1 sqlite + ≥ 3 md files) | ✓ | | `reasoning_content` (in every agent step) | ✓ | | `tool_calls` (samples that use tools have id / name / arguments) | ✓ | | `tool_result` (samples that use tools have a corresponding result) | ✓ |
提供机构:
obaydata
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作