five

atr0p05/aegis-training-v2.2

收藏
Hugging Face2025-12-08 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/atr0p05/aegis-training-v2.2
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation language: - en tags: - revops - sales - revenue-operations - llm-training - aegis pretty_name: AEGIS RevOps Training Dataset v2.2 size_categories: - 1K<n<10K --- # AEGIS RevOps Training Dataset v2.2 High-quality, domain-focused training data for the AEGIS 3-tier Revenue Operations AI assistant. ## v2.2 Improvements (over v2.1) - **Main model expanded**: 2,025 examples (up from 825) - **+1,200 synthetic RevOps examples** across 8 categories: - Pipeline coverage (200) - Win rate analysis (150) - Commission calculations (150) - Churn analysis (150) - Forecasting (150) - Quota planning (100) - Territory management (100) - Table-based prompts (200) - **Voice bridging data**: 150 handoff examples (voice → main) - **Router calibration**: 200 additional calibration examples - **Total dataset**: 5,416 examples ## Dataset Description ### Router (0.5B Model) - **Purpose**: Intent classification for routing queries - **Intents**: voice_simple, crm_lookup, complex_analysis, action_confirm, fallback - **Features**: Hard negatives + calibration examples - **Count**: 2,084 examples ### Voice (7B Model) - **Purpose**: Quick, conversational responses - **Focus**: Concise RevOps answers for voice/chat - **Features**: Bridging data for handoff to Main model - **Count**: 1,307 examples ### Main (72B Model) - **Purpose**: Complex analysis with chain-of-thought reasoning - **Features**: All responses include `<think>...</think>` blocks - **Count**: 2,025 examples ## Usage ```python from datasets import load_dataset # Load router training data router = load_dataset("atr0p05/aegis-training-v2.2", data_dir="router") # Load main model training data main = load_dataset("atr0p05/aegis-training-v2.2", data_dir="main") # Load voice model training data voice = load_dataset("atr0p05/aegis-training-v2.2", data_dir="voice") ``` ## <think> Block Policy The Main model uses `<think>...</think>` blocks for chain-of-thought reasoning: ``` <think> [Internal reasoning here] </think> [User-facing response here] ``` **Recommended inference approach:** - Train with `<think>` blocks (teaches reasoning) - Strip `<think>...</think>` at serving time - User sees only the response after `</think>` ## Topics Covered - Pipeline analysis and forecasting - Commission calculations - Win/loss analysis - Quota planning and territory management - Churn and retention analysis - Deal health scoring - Sales metrics and KPIs - CAC/LTV/NRR/ARR calculations - Ramp time optimization - Table-based analytics (markdown tables) - Voice-to-analyst handoffs ## Training Approach **Recommended 2-phase training:** 1. **Phase 1 - Identity Lock-in**: Train on v2.2 only (2-3 epochs) 2. **Phase 2 - Coverage Expansion**: Continue with mixed data if needed ## Version History | Version | Main | Router | Voice | Total | Notes | |---------|------|--------|-------|-------|-------| | v2.0 | ~3K | ~2K | ~4K | 9,289 | Initial dataset | | v2.1 | 825 | 1,734 | 1,157 | 4,373 | Filtered non-RevOps | | v2.2 | 2,025| 2,084 | 1,307 | 5,416 | Synthetic expansion | ## License Apache 2.0
提供机构:
atr0p05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作