atr0p05/aegis-training-v2.2
收藏Hugging Face2025-12-08 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/atr0p05/aegis-training-v2.2
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
language:
- en
tags:
- revops
- sales
- revenue-operations
- llm-training
- aegis
pretty_name: AEGIS RevOps Training Dataset v2.2
size_categories:
- 1K<n<10K
---
# AEGIS RevOps Training Dataset v2.2
High-quality, domain-focused training data for the AEGIS 3-tier Revenue Operations AI assistant.
## v2.2 Improvements (over v2.1)
- **Main model expanded**: 2,025 examples (up from 825)
- **+1,200 synthetic RevOps examples** across 8 categories:
- Pipeline coverage (200)
- Win rate analysis (150)
- Commission calculations (150)
- Churn analysis (150)
- Forecasting (150)
- Quota planning (100)
- Territory management (100)
- Table-based prompts (200)
- **Voice bridging data**: 150 handoff examples (voice → main)
- **Router calibration**: 200 additional calibration examples
- **Total dataset**: 5,416 examples
## Dataset Description
### Router (0.5B Model)
- **Purpose**: Intent classification for routing queries
- **Intents**: voice_simple, crm_lookup, complex_analysis, action_confirm, fallback
- **Features**: Hard negatives + calibration examples
- **Count**: 2,084 examples
### Voice (7B Model)
- **Purpose**: Quick, conversational responses
- **Focus**: Concise RevOps answers for voice/chat
- **Features**: Bridging data for handoff to Main model
- **Count**: 1,307 examples
### Main (72B Model)
- **Purpose**: Complex analysis with chain-of-thought reasoning
- **Features**: All responses include `<think>...</think>` blocks
- **Count**: 2,025 examples
## Usage
```python
from datasets import load_dataset
# Load router training data
router = load_dataset("atr0p05/aegis-training-v2.2", data_dir="router")
# Load main model training data
main = load_dataset("atr0p05/aegis-training-v2.2", data_dir="main")
# Load voice model training data
voice = load_dataset("atr0p05/aegis-training-v2.2", data_dir="voice")
```
## <think> Block Policy
The Main model uses `<think>...</think>` blocks for chain-of-thought reasoning:
```
<think>
[Internal reasoning here]
</think>
[User-facing response here]
```
**Recommended inference approach:**
- Train with `<think>` blocks (teaches reasoning)
- Strip `<think>...</think>` at serving time
- User sees only the response after `</think>`
## Topics Covered
- Pipeline analysis and forecasting
- Commission calculations
- Win/loss analysis
- Quota planning and territory management
- Churn and retention analysis
- Deal health scoring
- Sales metrics and KPIs
- CAC/LTV/NRR/ARR calculations
- Ramp time optimization
- Table-based analytics (markdown tables)
- Voice-to-analyst handoffs
## Training Approach
**Recommended 2-phase training:**
1. **Phase 1 - Identity Lock-in**: Train on v2.2 only (2-3 epochs)
2. **Phase 2 - Coverage Expansion**: Continue with mixed data if needed
## Version History
| Version | Main | Router | Voice | Total | Notes |
|---------|------|--------|-------|-------|-------|
| v2.0 | ~3K | ~2K | ~4K | 9,289 | Initial dataset |
| v2.1 | 825 | 1,734 | 1,157 | 4,373 | Filtered non-RevOps |
| v2.2 | 2,025| 2,084 | 1,307 | 5,416 | Synthetic expansion |
## License
Apache 2.0
提供机构:
atr0p05



