atr0p05/aegis-training-v2
收藏Hugging Face2025-12-08 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/atr0p05/aegis-training-v2
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
language:
- en
tags:
- revops
- sales
- revenue-operations
- llm-training
- aegis
pretty_name: AEGIS RevOps Training Dataset v2
size_categories:
- 10K<n<100K
---
# AEGIS RevOps Training Dataset v2
High-quality training data for the AEGIS 3-tier Revenue Operations AI assistant.
## Dataset Description
This dataset contains training examples for three specialized models:
### Router (0.5B Model)
- **Purpose**: Intent classification for routing queries
- **Intents**: voice_simple, crm_lookup, complex_analysis, action_confirm, fallback
- **Examples**: ~2,600
### Voice (7B Model)
- **Purpose**: Quick, conversational responses
- **Focus**: Concise RevOps answers for voice/chat
- **Examples**: ~4,700
### Main (72B Model)
- **Purpose**: Complex analysis with chain-of-thought reasoning
- **Features**: All responses include `<think>...</think>` blocks
- **Examples**: ~3,700
## v2 Improvements
- ✅ Removed near-duplicate examples (44% reduction)
- ✅ Added multi-turn conversation examples
- ✅ Expanded churn/retention topic coverage
- ✅ 100% of Main examples have valid `<think>` blocks
- ✅ Clean, normalized router labels
## Usage
```python
from datasets import load_dataset
# Load router training data
router = load_dataset("atr0p05/aegis-training-v2", data_dir="router")
# Load main model training data
main = load_dataset("atr0p05/aegis-training-v2", data_dir="main")
# Load voice model training data
voice = load_dataset("atr0p05/aegis-training-v2", data_dir="voice")
```
## Format
All examples are in chat format:
```json
{
"messages": [
{"role": "system", "content": "..."},
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."}
]
}
```
## Topics Covered
- Pipeline analysis and forecasting
- Commission calculations
- Win/loss analysis
- Quota planning and territory management
- Churn and retention analysis
- Deal health scoring
- Sales metrics and KPIs
- Multi-turn conversations
- Safety/edge cases
## License
Apache 2.0
提供机构:
atr0p05



