obaydata/openclaw-trajectories-demo
收藏Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/obaydata/openclaw-trajectories-demo
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
language:
- en
- zh
tags:
- agentic-trajectories
- tool-use
- multi-turn
- chain-of-thought
- reasoning
- openclaw
- agent-training
- sft
- rlhf
size_categories:
- n<1K
task_categories:
- text-generation
- question-answering
pretty_name: "OpenClaw Trajectories Demo"
dataset_info:
- config_name: qa_overview
features:
- name: query_id
dtype: string
- name: user_query
dtype: string
- name: final_answer
dtype: string
- name: n_rounds
dtype: int32
- config_name: trajectories_meta
features:
- name: query_id
dtype: string
- name: session_id
dtype: string
- name: collected_at
dtype: string
- name: agent_id
dtype: string
- name: model
dtype: string
- name: user_query
dtype: string
- name: final_answer
dtype: string
- name: n_rounds
dtype: int32
- name: n_steps
dtype: int32
- name: system_prompt_len
dtype: int32
- name: tools_count
dtype: int32
- name: has_tool_calls
dtype: bool
configs:
- config_name: qa_overview
data_files: "qa_overview.jsonl"
- config_name: trajectories_meta
data_files: "trajectories_meta.jsonl"
- config_name: full_trajectories
data_files: "*/trajectory.json"
default: true
---
# OpenClaw Trajectories Demo
> **Rubrics-based review** and **human verification** are available as add-on services on client request. Contact us for details.
8 sample execution trajectories collected from [OpenClaw](https://docs.openclaw.ai). Each sample is a complete real-time conversation snapshot of an OpenClaw agent, capturing the model's input (system prompt + tool definitions + user query + history), output (reasoning content + final answer + tool calls), and the agent's state (workspace files + memory knowledge base).
---
## I. Dataset Overview
| Item | Value |
|---|---|
| Sample count | 8 |
| Model | `moonshot/kimi-k2.5` |
| Language | English (system prompt + query + answer) |
| Persona count | 4 (`coder-py` / `finance-cn` / `writer-tech` / `researcher`) |
| Samples per persona | 2 |
| Average system prompt length | ~17,500 characters |
| Tools schema count | 18 / sample |
| Disk footprint per sample | ~200 KB (including the sqlite memory index) |
Each sample is a **directory** (not a single file) containing several structured files. One sample corresponds to **one complete "user asks → agent thinks → agent calls tools → agent answers"** lifecycle.
---
## II. Persona Design
The dataset showcases 4 AI assistant personas from different domains. Each persona has its own independent workspace and memory:
| Persona ID | Role | Key traits |
|---|---|---|
| `coder-py` | Senior Python engineer | 10+ years of experience; pragmatic and precise; insists on type hints, unit tests, minimal diffs |
| `finance-cn` | Equity research analyst | US / HK / A-share specialist; strictly separates objective data from opinions; always cites the basis when quoting numbers |
| `writer-tech` | Technical writer | Specializes in release notes / runbooks / API docs; active voice, structured, zero marketing speak |
| `researcher` | Research assistant | Must cite sources; cross-checks against two independent sources before drawing conclusions |
Each persona is described in its **workspace** through 8 `.md` files (role, personality, tool preferences, default user persona, etc.) and holds 3 domain-knowledge `.md` files in its **memory** (background guidelines, historical incidents, glossary, etc.). These files are injected into the system prompt at runtime, or loaded on demand by the `read` tool.
---
## III. Directory Structure
```
openclaw_trajectories_demo/
├── readme.md
├── coder-py-q01/ # one sample = one directory
│ ├── trajectory.json # full execution trajectory (the core training file)
│ ├── qa.json # compact query + answer (for fast browsing / indexing)
│ ├── workspace/ # workspace snapshot at the moment the agent enters this query
│ │ ├── AGENTS.md # role description / working style / tool conventions
│ │ ├── IDENTITY.md # identity / role definition
│ │ ├── SOUL.md # personality / tone / values
│ │ ├── USER.md # default user persona
│ │ ├── TOOLS.md # tool usage notes
│ │ ├── MEMORY.md # memory file index (tells the agent what is loadable)
│ │ ├── BOOTSTRAP.md # bootstrap notes
│ │ └── HEARTBEAT.md # heartbeat / status notes
│ └── memory/ # long-term memory snapshot (loaded on demand via `read`)
│ ├── <slot>.sqlite # OpenClaw built-in FTS index database
│ ├── python_style.md # domain-knowledge file 1
│ ├── past_incidents.md # domain-knowledge file 2
│ └── preferred_libs.md # domain-knowledge file 3
│
├── coder-py-q02/ # ...same directory layout
├── finance-cn-q01/
├── finance-cn-q02/
├── writer-tech-q01/
├── writer-tech-q02/
├── researcher-q01/
└── researcher-q02/
```
The 4 components inside each sample directory:
| Path | Type | Purpose |
|---|---|---|
| `trajectory.json` | JSON file | Main training payload — contains all model I/O |
| `qa.json` | JSON file (~250 B) | Just the query + final answer; useful for quick grep / stats |
| `workspace/` | directory (8 `.md`) | Snapshot of the agent's workspace at session start |
| `memory/` | directory (1 sqlite + N md) | Snapshot of the agent's long-term memory |
---
## IV. `trajectory.json` Field Reference
Each `trajectory.json` is a JSON object with the following top-level fields:
| Field | Type | Description |
|---|---|---|
| `schema_version` | string | Data format version (`openclaw-traj-v1`) |
| `session_id` | string | OpenClaw internal session UUID |
| `query_id` | string | Unique sample ID; matches the directory name |
| `collected_at` | ISO 8601 string | Collection timestamp (UTC) |
| `agent` | object | Agent metadata: `id` / `model` / `thinking_config` / `workspace_source` |
| `workspace` | object | Workspace reference: `dir` (relative path) + `files` (list of file names) |
| `memory` | object | Memory reference: `dir` + `copied` (file list) + `tables` (sqlite table list) |
| `system_prompt` | string | **Full system prompt** (~17,500 characters; comes directly from LLM request `messages[0]`) |
| `tools_schema` | array[object] | **Full JSON Schema definitions for all 18 tools** (function / parameters / required) |
| `user_query` | string | Original user input |
| `steps` | array[object] | **Step-by-step execution trace** (the central field, see below) |
| `final_answer` | string | Final answer (= the `content` of the last agent step) |
| `n_rounds` | int | Number of LLM call rounds (= tool call count + 1) |
### `steps` array structure
Each step represents one event in the conversation. There are 3 possible `source` values:
#### 1. `source: "user"` — user message
```json
{
"step_id": 1,
"source": "user",
"content": "Write a Python function that takes a list of integers and returns the second-largest unique value..."
}
```
#### 2. `source: "agent"` — model response
```json
{
"step_id": 2,
"source": "agent",
"reasoning_content": "The user wants... I should first check...",
"content": "```python\ndef second_max_unique(...):\n ...\n```",
"tool_calls": [
{
"id": "exec0",
"type": "function",
"name": "exec",
"arguments": {"command": "python second_largest.py"},
"arguments_raw": "{\"command\":\"python second_largest.py\"}"
}
]
}
```
| Field | Description |
|---|---|
| `reasoning_content` | The model's internal chain of thought (Kimi's native `reasoning_content` field; equivalent to OpenAI / Claude thinking blocks) |
| `content` | The text the model surfaces to the user |
| `tool_calls` | Tools the model invoked this round (can be an empty array) |
| `tool_calls[].id` | Tool call ID; correlated with the next `tool_result` step |
| `tool_calls[].name` | Tool name (matches a `function.name` in `tools_schema`) |
| `tool_calls[].arguments` | Parsed JSON arguments object |
| `tool_calls[].arguments_raw` | Original JSON string (for training that needs the literal form) |
#### 3. `source: "tool_result"` — tool execution result
```json
{
"step_id": 3,
"source": "tool_result",
"tool_call_id": "exec0",
"content": "All tests passed.\n"
}
```
| Field | Description |
|---|---|
| `tool_call_id` | Matches a `tool_calls[].id` from the previous agent step |
| `content` | Tool output (stdout / stderr / return value) |
### Step ordering rules
- The first step is always `source: "user"`
- The middle steps strictly alternate between `agent` and `tool_result` (agent invokes a tool → tool_result returns the output → agent continues)
- The last step is always `source: "agent"`, and its `content` equals `final_answer`
---
## V. Field-to-Training-Objective Mapping
| Training paradigm | Fields used |
|---|---|
| **SFT (supervised fine-tuning)** | `system_prompt` + `tools_schema` + `user_query` + (intermediate `tool_calls` + `tool_result`) → `content` as the label |
| **CoT / Reasoning SFT** | Same as above, but `reasoning_content` is also used as a label |
| **DPO / RLHF** | A trajectory becomes a chosen / rejected candidate (extra scoring required) |
| **Tool-use training** | `tools_schema` (as model input) + `tool_calls` (as label) |
| **Multi-turn agent SFT** | The full `steps` array unrolled into (input, output) pairs |
---
## VI. Memory Files
The `.md` files inside `memory/` are the **domain background knowledge** the agent holds at the moment of this query. OpenClaw exposes these to the agent in two ways:
1. **Injected into the system prompt**: The `MEMORY.md` file at the workspace root automatically lists all available memory file paths. The system prompt already contains this index.
2. **Loaded at runtime**: The agent calls `read` on a specific path (e.g. `read("memory/python_style.md")`).
`<slot>.sqlite` is OpenClaw's built-in FTS index database (10 tables). In the default FTS-only mode the index is empty, but the schema is intact and downstream consumers can re-index it as needed.
Memory files per persona:
| Persona | Memory files |
|---|---|
| `coder-py` | `python_style.md` (code conventions) / `past_incidents.md` (postmortems) / `preferred_libs.md` (recommended libs) |
| `finance-cn` | `estimation_methods.md` (valuation methods) / `key_metrics.md` (financial metrics) / `watch_list.md` (sector watch list) |
| `writer-tech` | `style_guide.md` (writing style guide) / `doc_templates.md` (doc templates) / `common_mistakes.md` (common mistakes) |
| `researcher` | `citation_format.md` (citation rules) / `preferred_sources.md` (trusted sources) / `research_method.md` (research methodology) |
---
## VII. Workspace Files
The 8 `.md` files in `workspace/` are **automatically loaded by OpenClaw at runtime and concatenated into the system prompt**. They play a role similar to `.cursorrules` / `AGENTS.md` configuration in coding agents like Cursor / Continue:
| File | Contents |
|---|---|
| `AGENTS.md` | Role description + workflow + tool usage conventions (the main body, ~7-8 KB) |
| `IDENTITY.md` | Identity / role definition (~600 B) |
| `SOUL.md` | Personality / tone / values (~1.5 KB) |
| `USER.md` | Default user persona (~500 B) |
| `TOOLS.md` | Tool usage notes (~850 B) |
| `MEMORY.md` | Memory file index (tells the agent what is loadable) |
| `BOOTSTRAP.md` | Bootstrap notes (auto-generated by OpenClaw) |
| `HEARTBEAT.md` | Heartbeat / status notes (auto-generated by OpenClaw) |
The full system prompt that ends up in the LLM call is **already saved** in `trajectory.json.system_prompt`. Consumers do **not** need to re-concatenate manually — just use that field directly. The `workspace/` directory exists primarily for:
- **Reproducibility**: Anyone can rebuild the system prompt from these source files
- **Interpretability**: A human reviewer can quickly see "why this agent behaves this way"
- **Customization**: To fork a new persona, simply edit these files
---
## VIII. Single-Sample Statistics Example
Take `coder-py-q01` as an example:
| Item | Value |
|---|---|
| User query | "Write a Python function that takes a list of integers and returns the second-largest unique value..." |
| Final answer length | 733 characters (full Python code + tests + complexity analysis) |
| n_rounds | 1 (no tool calls; answered directly) |
| Steps count | 2 (1 user + 1 agent) |
| system_prompt length | 17,549 characters |
| tools_schema count | 18 (read / write / edit / exec / web_search / web_fetch / memory_search / ...) |
| reasoning_content length | ~500 characters (Kimi's internal reasoning) |
| Workspace files | 8 |
| Memory files | 4 (1 sqlite + 3 md) |
| Total directory size | ~200 KB |
`finance-cn-q01` showcases the other typical case — with tool calls:
| Item | Value |
|---|---|
| n_rounds | 4 (3 tool calls) |
| Steps count | 8 (1 user + 4 agent + 3 tool_result) |
| Tool usage | `memory_search` → `read("memory/estimation_methods.md")` → `read("memory/key_metrics.md")` → final synthesis |
This sample fully demonstrates a typical multi-round workflow where the agent actively loads memory files to support its answer.
---
## IX. Collection Methodology
The data is captured from real OpenClaw runtime LLM API calls via an HTTP interceptor — **it is not synthesized or post-edited**. The collection pipeline:
1. Run a local HTTP proxy
2. Configure OpenClaw to point its LLM API base URL at the proxy
3. The proxy fully records every request (system_prompt + tools_schema + messages) and response (content + reasoning_content + tool_calls)
4. For each user query, run `openclaw agent --local` once; the proxy captures all LLM I/O for that run
5. Post-processing: combine the proxy logs + workspace files + memory files into the directory structure described above
**Guarantees**:
- `system_prompt` / `tools_schema` / `reasoning_content` come directly from real LLM API I/O, **without any editing**
- `workspace/` and `memory/` are actual snapshots of the state at query start time
- Workspaces are strictly isolated between queries (automatically restored after each run); samples never pollute each other
---
## X. Field Completeness Guarantee
| Field | Satisfied by 8 / 8 samples |
|---|---|
| `system_prompt` (≥ 1000 chars) | ✓ |
| `user_query` (non-empty) | ✓ |
| `tools_schema` (≥ 5 tools, with full JSON Schema) | ✓ |
| `steps` (at least 2 steps, first = user, last = agent) | ✓ |
| `final_answer` (non-empty) | ✓ |
| `agent` metadata | ✓ |
| `workspace/` (all 8 `.md` files present) | ✓ |
| `memory/` (1 sqlite + ≥ 3 md files) | ✓ |
| `reasoning_content` (in every agent step) | ✓ |
| `tool_calls` (samples that use tools have id / name / arguments) | ✓ |
| `tool_result` (samples that use tools have a corresponding result) | ✓ |
提供机构:
obaydata



