CMU-FLAME/FLAME-MoE-Traces

Name: CMU-FLAME/FLAME-MoE-Traces
Creator: CMU-FLAME
Published: 2026-03-30 12:55:50
License: 暂无描述

Hugging Face2026-03-30 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/CMU-FLAME/FLAME-MoE-Traces

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: apache-2.0 pretty_name: FLAME-MoE Routing Traces size_categories: - 1B<n<10B task_categories: - text-generation tags: - mixture-of-experts - routing-traces - moe --- # FLAME-MoE Routing Traces Routing traces captured during pretraining of [FLAME-MoE](https://github.com/cmu-flame/FLAME-MoE) Mixture-of-Experts language models. For each token processed by the model, these traces record which experts the router selected (top-k expert IDs) and the corresponding gating probabilities (router softmax scores). **Architecture** | Model | Params (Active/Total) | Transformer Layers | MoE Layers | Routed Experts | Shared Experts | Top-k | |:-----:|:---------------------:|:------------------:|:----------:|:--------------:|:--------------:|:-----:| | FLAME-MoE-290M | 290M / 1.3B | 9 | 8 (layers 2-9) | 64 | 2 | 6 | | FLAME-MoE-721M | 721M / 3.8B | 13 | 11 (layers 2-12) | 64 | 2 | 6 | | FLAME-MoE-1.7B | 1.7B / 10.3B | 19 | 17 (layers 2-18) | 64 | 2 | 6 | The 2 shared experts are always active and **not** included in the traces. Only the 64 routed experts are logged. **Data Layout** ``` flame-moe-{290m,721m,1.7b}/ ├── samples/ │ ├── 000.parquet ... NNN.parquet └── actives/ ├── iter_NNNN/ │ ├── layer_02.parquet ... layer_NN.parquet └── ... ``` - **`samples/`** — Token IDs fed into the model. Shared across all iterations (same data order for every checkpoint). - **`actives/`** — Router decisions per (iteration, layer). One parquet file per MoE layer per training checkpoint. **Schema** `samples/*.parquet` — each row is one token: | Column | Type | Description | |:------:|:----:|:------------| | `token_id` | `int32` | Input token ID | `actives/iter_NNNN/layer_NN.parquet` — each row is one token's routing decision: | Column | Type | Description | |:------:|:----:|:------------| | `scores` | `list<float16>[6]` | Router softmax probabilities for the top-6 selected experts, sorted descending | | `indices` | `list<int16>[6]` | Routed expert IDs (0-63) corresponding to each score | Row `i` in an actives file aligns with row `i` in the samples files. Each capture contains 52,428,800 tokens. **Checkpoints Captured** | Model | Iterations | |:-----:|:----------:| | FLAME-MoE-290M | 540, 1080, 1620, 2160, 2700, 3240, 3780, 4320, 4860, 5400, 5473 | | FLAME-MoE-721M | 880, 1760, 2640, 3520, 4400, 5280, 6160, 7040, 7920, 8800, 8815 | | FLAME-MoE-1.7B | 1100, 2200, 3300, 4400, 5500, 6600, 7700, 8800, 9900, 11000, 11029 | **Quick Start** ```python import pyarrow.parquet as pq # Load routing decisions for iteration 5473, layer 2 actives = pq.read_table("flame-moe-290m/actives/iter_5473/layer_02.parquet") # Each row is one token row = actives.slice(0, 1) print(row.column("indices")[0].as_py()) # e.g. [34, 28, 21, 47, 3, 12] print(row.column("scores")[0].as_py()) # e.g. [0.0998, 0.0523, 0.0417, 0.0384, 0.0326, 0.0296] # Load corresponding token IDs samples = pq.read_table("flame-moe-290m/samples/000.parquet") print(samples.column("token_id")[0].as_py()) # e.g. 1512 ``` With HuggingFace Datasets (streaming, no full download): ```python from datasets import load_dataset ds = load_dataset( "CMU-FLAME/FLAME-MoE-Traces", data_files="flame-moe-290m/actives/iter_5473/layer_02.parquet", split="train", streaming=True, ) for row in ds.take(5): print(row["indices"], row["scores"]) ``` **Citation** ```bibtex @article{kang2025flame, title={FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models}, author={Kang, Hao and Yu, Zichun and Xiong, Chenyan}, journal={arXiv preprint arXiv:2505.20225}, year={2025} } ```

提供机构：

CMU-FLAME

5,000+

优质数据集

54 个

任务类型

进入经典数据集