thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509-Data
收藏Hugging Face2025-11-30 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509-Data
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
language:
- en
tags:
- eagle3
- speculative-decoding
- conversation
- synthetic
size_categories:
- 100K<n<1M
---
# EAGLE3-Apertus-8B-Instruct-2509-Data
Training dataset for the [thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509](https://huggingface.co/thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509) speculative decoding draft model.
## Dataset Description
This dataset contains ~375k multi-turn conversations used to train an Eagle3 draft model for [swiss-ai/Apertus-8B-Instruct-2509](https://huggingface.co/swiss-ai/Apertus-8B-Instruct-2509).
### Data Sources
The prompts are sourced from:
- [UltraChat](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) - Large-scale multi-turn dialogue dataset
- [ShareGPT](https://huggingface.co/datasets/Aeala/ShareGPT_Vicuna_unfiltered) - Real user conversations
- [OpenThoughts-114k-math](https://huggingface.co/datasets/open-r1/OpenThoughts-114k-math) - Mathematical reasoning data
### Regeneration Process
The responses were **regenerated** using Apertus-8B-Instruct-2509 rather than using the original responses. This ensures the draft model learns to predict tokens from the target model's own output distribution, which is critical for effective speculative decoding.
## Dataset Format
JSONL format with the following structure:
```json
{
"id": "unique_sha256_hash",
"conversations": [
{"role": "user", "content": "User message..."},
{"role": "assistant", "content": "Regenerated assistant response..."},
{"role": "user", "content": "Follow-up question..."},
{"role": "assistant", "content": "Regenerated follow-up response..."}
],
"status": "success"
}
```
### Fields
| Field | Type | Description |
|-------|------|-------------|
| `id` | string | SHA-256 hash identifier for the conversation |
| `conversations` | array | List of conversation turns with role and content |
| `status` | string | Processing status (`success` indicates valid sample) |
## Dataset Statistics
- **Format**: JSONL
- **Samples**: 375,573
- **Languages**: Primarily English
## Usage
### Load with Hugging Face Datasets
```python
from datasets import load_dataset
dataset = load_dataset("thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509-Data")
```
### Load Directly
```python
import json
conversations = []
with open("merged_train_regen.jsonl", "r") as f:
for line in f:
conversations.append(json.loads(line))
```
### Train with SpecForge
```bash
NUM_GPUS=4
TP_SIZE=1
torchrun \
--standalone \
--nproc_per_node $NUM_GPUS \
scripts/train_eagle3.py \
--target-model-path swiss-ai/Apertus-8B-Instruct-2509 \
--draft-model-config /path/to/configs/apertus-8b-eagle3.json \
--train-data-path /path/to/merged_train_regen.jsonl \
--output-dir /path/to/outputs/apertus-8b-eagle3 \
--num-epochs 10 \
--batch-size 1 \
--tp-size $TP_SIZE \
--learning-rate 1e-4 \
--max-length 4096 \
--chat-template apertus \
--cache-dir /path/to/cache \
--target-model-backend sglang
```
## Related Resources
- [EAGLE3-Apertus-8B-Instruct-2509](https://huggingface.co/thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509) - Trained draft model
- [SpecForge](https://github.com/sgl-project/SpecForge) - Training framework
- [Eagle3 Paper](https://arxiv.org/abs/2503.01840)
- [Apertus-8B-Instruct-2509](https://huggingface.co/swiss-ai/Apertus-8B-Instruct-2509) - Target model
## License
Apache 2.0
## Citation
```bibtex
@article{li2025eagle3,
title={Eagle 3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test},
author={Li, Yuhui and Wei, Fangyun and Zhang, Chao and Zhang, Hongyang},
journal={arXiv preprint arXiv:2503.01840},
year={2025}
}
```
提供机构:
thomaskiefer



