0xSero/reap-calibration-data-v1
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/0xSero/reap-calibration-data-v1
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: apache-2.0
tags:
- calibration
- moe
- expert-pruning
- reap
- benchmark-free
size_categories:
- 10K<n<100K
task_categories:
- text-generation
---
# REAP Calibration Dataset v1
Benchmark-free calibration dataset for [REAP](https://arxiv.org/abs/2510.13999) (Routing-Enhanced Activation Pruning) of Mixture-of-Experts language models.
## What This Dataset Does
REAP prunes MoE models by removing experts that rarely activate. To decide which experts are safe to remove, REAP needs to **observe** which experts fire on diverse inputs. This dataset provides those inputs.
**This is NOT training data.** No model weights are updated. The dataset is fed through the model in inference mode to record expert routing statistics. Those statistics then guide the pruning decisions.
## Key Property: Zero Benchmark Contamination
This dataset was specifically curated to **exclude all common evaluation benchmarks**, ensuring REAP pruning decisions are not biased toward benchmark-specific patterns.
### Excluded Benchmarks
HumanEval, MBPP, EvalPlus, SWE-bench (all variants), TerminalBench, GSM8K, MATH-500, GAIA, KernelBench, ARC, BoolQ, HellaSwag, WinoGrande, MMLU (except physics/chemistry for science coverage), TruthfulQA, PIQA, OpenBookQA, MathQA, LiveCodeBench.
## Dataset Composition
**23,088 samples** across 10 domains, weighted toward coding and tool-use workloads:
### Domain Breakdown
| Domain | Samples | % | Description |
|--------|---------|---|-------------|
| **Function Calling** | 5,000 | 21.7% | Structured tool/API invocations with parameters, return types, and multi-step chains |
| **Agentic Traces** | 3,893 | 16.9% | Multi-turn agent trajectories with reasoning, tool calls, and environment feedback |
| **Cybersecurity** | 3,000 | 13.0% | OWASP, MITRE ATT&CK, incident response, cloud security, cryptography |
| **General Coding** | 2,000 | 8.7% | Diverse programming across languages and paradigms |
| **Deep Reasoning** | 2,000 | 8.7% | Competition math with chain-of-thought, logical reasoning, problem solving |
| **Math** | 2,000 | 8.7% | Real math StackExchange/MathOverflow Q&A with LaTeX |
| **CUDA Programming** | 2,000 | 8.7% | GPU kernels, optimization, profiling data across difficulty levels |
| **Terminal / CLI** | 1,500 | 6.5% | Shell commands, system administration, CLI workflows |
| **Long Context** | 1,500 | 6.5% | 8K-16K token instruction-following traces |
| **Science** | 195 | 0.8% | College-level physics and chemistry |
### Source Datasets
| Source | Samples | Domain | License |
|--------|---------|--------|---------|
| [Salesforce/xlam-function-calling-60k](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) | 2,000 | Function Calling | CC-BY-4.0 |
| [interstellarninja/hermes_reasoning_tool_use](https://huggingface.co/datasets/interstellarninja/hermes_reasoning_tool_use) | 1,500 | Function Calling | Open |
| [glaiveai/glaive-function-calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2) | 1,500 | Function Calling | Open |
| [NousResearch/hermes-function-calling-v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1) | 1,893 | Agentic | Open |
| [argilla/distilabel-reasoning-prompts](https://huggingface.co/datasets/argilla/distilabel-reasoning-prompts) | 2,000 | Agentic | Apache-2.0 |
| [AlicanKiraz0/Cybersecurity-Dataset-Fenrir-v2.0](https://huggingface.co/datasets/AlicanKiraz0/Cybersecurity-Dataset-Fenrir-v2.0) | 2,000 | Cybersecurity | Apache-2.0 |
| [Trendyol/Trendyol-Cybersecurity-Instruction-Tuning-Dataset](https://huggingface.co/datasets/Trendyol/Trendyol-Cybersecurity-Instruction-Tuning-Dataset) | 1,000 | Cybersecurity | Open |
| [nvidia/OpenCodeInstruct](https://huggingface.co/datasets/nvidia/OpenCodeInstruct) | 2,000 | Coding | Open |
| [AI-MO/NuminaMath-CoT](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT) | 2,000 | Deep Reasoning | Open |
| [math-ai/StackMathQA](https://huggingface.co/datasets/math-ai/StackMathQA) | 2,000 | Math | CC-BY-SA |
| [SakanaAI/AI-CUDA-Engineer-Archive](https://huggingface.co/datasets/SakanaAI/AI-CUDA-Engineer-Archive) | 2,000 | CUDA | CC-BY-4.0 |
| [b-mc2/cli-commands-explained](https://huggingface.co/datasets/b-mc2/cli-commands-explained) | 1,500 | Terminal | Open |
| [THUDM/LongAlign-10k](https://huggingface.co/datasets/THUDM/LongAlign-10k) | 1,500 | Long Context | Open |
| [cais/mmlu](https://huggingface.co/datasets/cais/mmlu) (physics + chemistry) | 195 | Science | MIT |
## REAP Packing Strategy
Per the [REAP paper](https://arxiv.org/abs/2510.13999), for models ≥110B parameters:
- **No packing** — each sample is its own sequence
- **Max sequence length**: 16,384 tokens
- Samples longer than 16K tokens are truncated
- The REAP observer handles tokenization and batching at runtime
- **Batch size**: 8 sequences per forward pass
For models <110B parameters, the paper recommends packing multiple samples to fill 2,048-token sequences.
## Format
JSONL with fields:
```json
{
"id": "function_calling_0",
"domain": "function_calling",
"repo_id": "Salesforce/xlam-function-calling-60k",
"subset": "default",
"text": "..."
}
```
## Usage with REAP
```bash
python scripts/run_qwen35_layerwise_observations_pr17.py \
--dataset-jsonl calibration-v1.jsonl \
--max-tokens 16384 \
--batch-size 8 \
--observation-sequence-chunk-size 1 \
--max-group-batches 20 \
--checkpoint-every-samples 800
```
## Models Calibrated With This Dataset
- [Qwen3.5-122B-A10B-REAP-20](https://huggingface.co/0xSero/Qwen3.5-122B-A10B-REAP-20) (20% pruned, 97.9% capability retained)
- [Qwen3.5-122B-A10B-REAP-30](https://huggingface.co/0xSero/Qwen3.5-122B-A10B-REAP-30) (30% pruned)
- [Qwen3.5-122B-A10B-REAP-40](https://huggingface.co/0xSero/Qwen3.5-122B-A10B-REAP-40) (40% pruned)
- Qwen3.5-397B-A17B (observations in progress)
## Maintainer
- **Author**: [0xSero](https://huggingface.co/0xSero)
- **Organization**: Sybil Solutions
- **Project**: REAP PR17
## Citation
```bibtex
@article{lu2025reap,
title={Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture of Experts},
author={Lu, Xudong and Qiu, Liu and Huang, Jinhao and others},
journal={arXiv preprint arXiv:2510.13999},
year={2025}
}
```
提供机构:
0xSero



