five

chan4lk/okr-asms-corpus

收藏
Hugging Face2026-04-03 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/chan4lk/okr-asms-corpus
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 language: - en size_categories: - 1K<n<10K task_categories: - text-generation - text-classification tags: - okr - tool-use - agent - asms - synthetic - mcp - keyflow pretty_name: OKR Agent Synthetic Corpus (ASMS) dataset_info: features: - name: input struct: - name: query dtype: string - name: session_context struct: - name: userId dtype: string - name: activeCycleId dtype: string - name: activeCycleName dtype: string - name: workflow dtype: string - name: tool_calls list: - name: tool dtype: string - name: action dtype: string - name: params dtype: string - name: methodology_notes dtype: string - name: _meta struct: - name: workflow dtype: string - name: category dtype: string - name: variation dtype: string splits: - name: train num_examples: 4895 - name: validation num_examples: 576 - name: test num_examples: 288 --- # OKR Agent Synthetic Corpus (ASMS) A synthetic training corpus for an OKR (Objectives and Key Results) management agent, generated using **Agent-Specific Model Synthesis (ASMS)** — a pipeline that uses large LLMs as compilers to produce training data for task-specific micro-models. ## Dataset Description 5,759 (input, tool_calls, methodology_notes) training triples covering 6 OKR management workflows against the [Keyflow MCP](https://keyflow.tecbizsolutions.com) API. Each example maps a natural language user query to: 1. A **workflow** classification (which of 6 workflows to invoke) 2. **Tool calls** with parameters (Keyflow MCP API calls) 3. **Methodology notes** (Doerr OKR methodology checks) ### Workflows | Workflow | Description | Train | Val | Test | |----------|-------------|-------|-----|------| | `goal_to_okr` | Translate goals into structured OKRs | 1,226 | 147 | 77 | | `check_in` | Update progress on key results | 1,173 | 132 | 55 | | `view_okrs` | Display objectives and key results | 624 | 78 | 43 | | `reports` | Generate progress/health/summary reports | 632 | 78 | 35 | | `onboard` | Create starter OKRs for new hires | 615 | 76 | 38 | | `align` | Cascade/connect objectives across levels | 625 | 65 | 40 | ### Category Distribution | Category | Proportion | Description | |----------|-----------|-------------| | Normal | 80% | Standard business queries with varied phrasing | | Edge | 15% | Ambiguous, missing context, boundary conditions | | Adversarial | 5% | Anti-patterns, manipulation attempts, methodology violations | ### Tools (Keyflow MCP API) | Tool | Actions | |------|---------| | `cycle` | create, list, get_active, close, set_active | | `objective` | create, list, update, delete, search, align | | `key_result` | create, list, update, check_in, delete | | `user` | list, get, assign_role, generate_okrs, bulk_import_users | | `report` | progress, health_check, summary | ## Example ```json { "input": { "query": "I want to improve customer retention by 20% this quarter", "session_context": { "userId": "usr_042", "activeCycleId": "cyc_q2_2026", "activeCycleName": "Q2 2026" } }, "workflow": "goal_to_okr", "tool_calls": [ { "tool": "objective", "action": "create", "params": { "title": "Delight our customers with an exceptional retention experience", "description": "Focus on reducing churn and increasing customer lifetime value", "cycleId": "cyc_q2_2026", "ownerId": "usr_042", "type": "aspirational", "level": "team" } }, { "tool": "key_result", "action": "create", "params": { "objectiveId": "obj_042a", "title": "Reduce monthly churn rate from 5% to 3%", "metricType": "PERCENTAGE", "startValue": 5, "targetValue": 3, "unit": "%" } } ], "methodology_notes": { "objective_is_qualitative": true, "krs_are_measurable": true, "okr_type": "aspirational", "anti_patterns_detected": [] }, "_meta": { "workflow": "goal_to_okr", "category": "normal", "variation": "SaaS retention improvement goal" } } ``` ## Generation Method The corpus was generated using **Claude Sonnet 4.6** as the compiler LLM, following the ASMS pipeline: 1. Agent role specified in YAML (tools, workflows, constraints, Doerr methodology rules) 2. Parallel Sonnet agents generated batches of 100-200 examples per workflow 3. Category distribution (80/15/5) enforced per batch 4. All examples validated as parseable JSON Total generation cost: **$0** (generated within Claude Code sessions, not via API) ## Intended Use Training task-specific micro-models (5-100M parameters) for OKR agent tool-call generation. The dataset follows John Doerr's "Measure What Matters" methodology: - Objectives must be qualitative and inspirational (no numbers) - Key results must be measurable with metric types: NUMERIC, PERCENTAGE, BOOLEAN, MILESTONE - Aspirational OKRs target 0.7 score (stretch); committed OKRs target 1.0 - Anti-patterns detected: sandbagging, tasks-as-KRs, metrics-as-objectives, set-and-forget ## Associated Model Trained model: [chan4lk/okr-micro-asms](https://huggingface.co/chan4lk/okr-micro-asms) (15M params, 80% workflow routing, 50% valid JSON tool calls) ## Citation ```bibtex @dataset{ranaweera2026okrasms, title={OKR Agent Synthetic Corpus (ASMS)}, author={Ranaweera, Chandima}, year={2026}, publisher={HuggingFace}, note={Generated via Agent-Specific Model Synthesis} } ``` ## License Apache 2.0
提供机构:
chan4lk
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作