chan4lk/okr-asms-corpus
收藏Hugging Face2026-04-03 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/chan4lk/okr-asms-corpus
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
language:
- en
size_categories:
- 1K<n<10K
task_categories:
- text-generation
- text-classification
tags:
- okr
- tool-use
- agent
- asms
- synthetic
- mcp
- keyflow
pretty_name: OKR Agent Synthetic Corpus (ASMS)
dataset_info:
features:
- name: input
struct:
- name: query
dtype: string
- name: session_context
struct:
- name: userId
dtype: string
- name: activeCycleId
dtype: string
- name: activeCycleName
dtype: string
- name: workflow
dtype: string
- name: tool_calls
list:
- name: tool
dtype: string
- name: action
dtype: string
- name: params
dtype: string
- name: methodology_notes
dtype: string
- name: _meta
struct:
- name: workflow
dtype: string
- name: category
dtype: string
- name: variation
dtype: string
splits:
- name: train
num_examples: 4895
- name: validation
num_examples: 576
- name: test
num_examples: 288
---
# OKR Agent Synthetic Corpus (ASMS)
A synthetic training corpus for an OKR (Objectives and Key Results) management agent, generated using **Agent-Specific Model Synthesis (ASMS)** — a pipeline that uses large LLMs as compilers to produce training data for task-specific micro-models.
## Dataset Description
5,759 (input, tool_calls, methodology_notes) training triples covering 6 OKR management workflows against the [Keyflow MCP](https://keyflow.tecbizsolutions.com) API.
Each example maps a natural language user query to:
1. A **workflow** classification (which of 6 workflows to invoke)
2. **Tool calls** with parameters (Keyflow MCP API calls)
3. **Methodology notes** (Doerr OKR methodology checks)
### Workflows
| Workflow | Description | Train | Val | Test |
|----------|-------------|-------|-----|------|
| `goal_to_okr` | Translate goals into structured OKRs | 1,226 | 147 | 77 |
| `check_in` | Update progress on key results | 1,173 | 132 | 55 |
| `view_okrs` | Display objectives and key results | 624 | 78 | 43 |
| `reports` | Generate progress/health/summary reports | 632 | 78 | 35 |
| `onboard` | Create starter OKRs for new hires | 615 | 76 | 38 |
| `align` | Cascade/connect objectives across levels | 625 | 65 | 40 |
### Category Distribution
| Category | Proportion | Description |
|----------|-----------|-------------|
| Normal | 80% | Standard business queries with varied phrasing |
| Edge | 15% | Ambiguous, missing context, boundary conditions |
| Adversarial | 5% | Anti-patterns, manipulation attempts, methodology violations |
### Tools (Keyflow MCP API)
| Tool | Actions |
|------|---------|
| `cycle` | create, list, get_active, close, set_active |
| `objective` | create, list, update, delete, search, align |
| `key_result` | create, list, update, check_in, delete |
| `user` | list, get, assign_role, generate_okrs, bulk_import_users |
| `report` | progress, health_check, summary |
## Example
```json
{
"input": {
"query": "I want to improve customer retention by 20% this quarter",
"session_context": {
"userId": "usr_042",
"activeCycleId": "cyc_q2_2026",
"activeCycleName": "Q2 2026"
}
},
"workflow": "goal_to_okr",
"tool_calls": [
{
"tool": "objective",
"action": "create",
"params": {
"title": "Delight our customers with an exceptional retention experience",
"description": "Focus on reducing churn and increasing customer lifetime value",
"cycleId": "cyc_q2_2026",
"ownerId": "usr_042",
"type": "aspirational",
"level": "team"
}
},
{
"tool": "key_result",
"action": "create",
"params": {
"objectiveId": "obj_042a",
"title": "Reduce monthly churn rate from 5% to 3%",
"metricType": "PERCENTAGE",
"startValue": 5,
"targetValue": 3,
"unit": "%"
}
}
],
"methodology_notes": {
"objective_is_qualitative": true,
"krs_are_measurable": true,
"okr_type": "aspirational",
"anti_patterns_detected": []
},
"_meta": {
"workflow": "goal_to_okr",
"category": "normal",
"variation": "SaaS retention improvement goal"
}
}
```
## Generation Method
The corpus was generated using **Claude Sonnet 4.6** as the compiler LLM, following the ASMS pipeline:
1. Agent role specified in YAML (tools, workflows, constraints, Doerr methodology rules)
2. Parallel Sonnet agents generated batches of 100-200 examples per workflow
3. Category distribution (80/15/5) enforced per batch
4. All examples validated as parseable JSON
Total generation cost: **$0** (generated within Claude Code sessions, not via API)
## Intended Use
Training task-specific micro-models (5-100M parameters) for OKR agent tool-call generation. The dataset follows John Doerr's "Measure What Matters" methodology:
- Objectives must be qualitative and inspirational (no numbers)
- Key results must be measurable with metric types: NUMERIC, PERCENTAGE, BOOLEAN, MILESTONE
- Aspirational OKRs target 0.7 score (stretch); committed OKRs target 1.0
- Anti-patterns detected: sandbagging, tasks-as-KRs, metrics-as-objectives, set-and-forget
## Associated Model
Trained model: [chan4lk/okr-micro-asms](https://huggingface.co/chan4lk/okr-micro-asms) (15M params, 80% workflow routing, 50% valid JSON tool calls)
## Citation
```bibtex
@dataset{ranaweera2026okrasms,
title={OKR Agent Synthetic Corpus (ASMS)},
author={Ranaweera, Chandima},
year={2026},
publisher={HuggingFace},
note={Generated via Agent-Specific Model Synthesis}
}
```
## License
Apache 2.0
提供机构:
chan4lk



