thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509-Data

Name: thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509-Data
Creator: thomaskiefer
Published: 2025-11-30 14:10:09
License: 暂无描述

Hugging Face2025-11-30 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509-Data

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - text-generation language: - en tags: - eagle3 - speculative-decoding - conversation - synthetic size_categories: - 100K<n<1M --- # EAGLE3-Apertus-8B-Instruct-2509-Data Training dataset for the [thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509](https://huggingface.co/thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509) speculative decoding draft model. ## Dataset Description This dataset contains ~375k multi-turn conversations used to train an Eagle3 draft model for [swiss-ai/Apertus-8B-Instruct-2509](https://huggingface.co/swiss-ai/Apertus-8B-Instruct-2509). ### Data Sources The prompts are sourced from: - [UltraChat](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) - Large-scale multi-turn dialogue dataset - [ShareGPT](https://huggingface.co/datasets/Aeala/ShareGPT_Vicuna_unfiltered) - Real user conversations - [OpenThoughts-114k-math](https://huggingface.co/datasets/open-r1/OpenThoughts-114k-math) - Mathematical reasoning data ### Regeneration Process The responses were **regenerated** using Apertus-8B-Instruct-2509 rather than using the original responses. This ensures the draft model learns to predict tokens from the target model's own output distribution, which is critical for effective speculative decoding. ## Dataset Format JSONL format with the following structure: ```json { "id": "unique_sha256_hash", "conversations": [ {"role": "user", "content": "User message..."}, {"role": "assistant", "content": "Regenerated assistant response..."}, {"role": "user", "content": "Follow-up question..."}, {"role": "assistant", "content": "Regenerated follow-up response..."} ], "status": "success" } ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `id` | string | SHA-256 hash identifier for the conversation | | `conversations` | array | List of conversation turns with role and content | | `status` | string | Processing status (`success` indicates valid sample) | ## Dataset Statistics - **Format**: JSONL - **Samples**: 375,573 - **Languages**: Primarily English ## Usage ### Load with Hugging Face Datasets ```python from datasets import load_dataset dataset = load_dataset("thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509-Data") ``` ### Load Directly ```python import json conversations = [] with open("merged_train_regen.jsonl", "r") as f: for line in f: conversations.append(json.loads(line)) ``` ### Train with SpecForge ```bash NUM_GPUS=4 TP_SIZE=1 torchrun \ --standalone \ --nproc_per_node $NUM_GPUS \ scripts/train_eagle3.py \ --target-model-path swiss-ai/Apertus-8B-Instruct-2509 \ --draft-model-config /path/to/configs/apertus-8b-eagle3.json \ --train-data-path /path/to/merged_train_regen.jsonl \ --output-dir /path/to/outputs/apertus-8b-eagle3 \ --num-epochs 10 \ --batch-size 1 \ --tp-size $TP_SIZE \ --learning-rate 1e-4 \ --max-length 4096 \ --chat-template apertus \ --cache-dir /path/to/cache \ --target-model-backend sglang ``` ## Related Resources - [EAGLE3-Apertus-8B-Instruct-2509](https://huggingface.co/thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509) - Trained draft model - [SpecForge](https://github.com/sgl-project/SpecForge) - Training framework - [Eagle3 Paper](https://arxiv.org/abs/2503.01840) - [Apertus-8B-Instruct-2509](https://huggingface.co/swiss-ai/Apertus-8B-Instruct-2509) - Target model ## License Apache 2.0 ## Citation ```bibtex @article{li2025eagle3, title={Eagle 3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test}, author={Li, Yuhui and Wei, Fangyun and Zhang, Chao and Zhang, Hongyang}, journal={arXiv preprint arXiv:2503.01840}, year={2025} } ```

提供机构：

thomaskiefer

5,000+

优质数据集

54 个

任务类型

进入经典数据集