0xSero/reap-calibration-data-v1

Name: 0xSero/reap-calibration-data-v1
Creator: 0xSero
Published: 2026-04-10 12:42:29
License: 暂无描述

Hugging Face2026-04-10 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/0xSero/reap-calibration-data-v1

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: apache-2.0 tags: - calibration - moe - expert-pruning - reap - benchmark-free size_categories: - 10K<n<100K task_categories: - text-generation --- # REAP Calibration Dataset v1 Benchmark-free calibration dataset for [REAP](https://arxiv.org/abs/2510.13999) (Routing-Enhanced Activation Pruning) of Mixture-of-Experts language models. ## What This Dataset Does REAP prunes MoE models by removing experts that rarely activate. To decide which experts are safe to remove, REAP needs to **observe** which experts fire on diverse inputs. This dataset provides those inputs. **This is NOT training data.** No model weights are updated. The dataset is fed through the model in inference mode to record expert routing statistics. Those statistics then guide the pruning decisions. ## Key Property: Zero Benchmark Contamination This dataset was specifically curated to **exclude all common evaluation benchmarks**, ensuring REAP pruning decisions are not biased toward benchmark-specific patterns. ### Excluded Benchmarks HumanEval, MBPP, EvalPlus, SWE-bench (all variants), TerminalBench, GSM8K, MATH-500, GAIA, KernelBench, ARC, BoolQ, HellaSwag, WinoGrande, MMLU (except physics/chemistry for science coverage), TruthfulQA, PIQA, OpenBookQA, MathQA, LiveCodeBench. ## Dataset Composition **23,088 samples** across 10 domains, weighted toward coding and tool-use workloads: ### Domain Breakdown | Domain | Samples | % | Description | |--------|---------|---|-------------| | **Function Calling** | 5,000 | 21.7% | Structured tool/API invocations with parameters, return types, and multi-step chains | | **Agentic Traces** | 3,893 | 16.9% | Multi-turn agent trajectories with reasoning, tool calls, and environment feedback | | **Cybersecurity** | 3,000 | 13.0% | OWASP, MITRE ATT&CK, incident response, cloud security, cryptography | | **General Coding** | 2,000 | 8.7% | Diverse programming across languages and paradigms | | **Deep Reasoning** | 2,000 | 8.7% | Competition math with chain-of-thought, logical reasoning, problem solving | | **Math** | 2,000 | 8.7% | Real math StackExchange/MathOverflow Q&A with LaTeX | | **CUDA Programming** | 2,000 | 8.7% | GPU kernels, optimization, profiling data across difficulty levels | | **Terminal / CLI** | 1,500 | 6.5% | Shell commands, system administration, CLI workflows | | **Long Context** | 1,500 | 6.5% | 8K-16K token instruction-following traces | | **Science** | 195 | 0.8% | College-level physics and chemistry | ### Source Datasets | Source | Samples | Domain | License | |--------|---------|--------|---------| | [Salesforce/xlam-function-calling-60k](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) | 2,000 | Function Calling | CC-BY-4.0 | | [interstellarninja/hermes_reasoning_tool_use](https://huggingface.co/datasets/interstellarninja/hermes_reasoning_tool_use) | 1,500 | Function Calling | Open | | [glaiveai/glaive-function-calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2) | 1,500 | Function Calling | Open | | [NousResearch/hermes-function-calling-v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1) | 1,893 | Agentic | Open | | [argilla/distilabel-reasoning-prompts](https://huggingface.co/datasets/argilla/distilabel-reasoning-prompts) | 2,000 | Agentic | Apache-2.0 | | [AlicanKiraz0/Cybersecurity-Dataset-Fenrir-v2.0](https://huggingface.co/datasets/AlicanKiraz0/Cybersecurity-Dataset-Fenrir-v2.0) | 2,000 | Cybersecurity | Apache-2.0 | | [Trendyol/Trendyol-Cybersecurity-Instruction-Tuning-Dataset](https://huggingface.co/datasets/Trendyol/Trendyol-Cybersecurity-Instruction-Tuning-Dataset) | 1,000 | Cybersecurity | Open | | [nvidia/OpenCodeInstruct](https://huggingface.co/datasets/nvidia/OpenCodeInstruct) | 2,000 | Coding | Open | | [AI-MO/NuminaMath-CoT](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT) | 2,000 | Deep Reasoning | Open | | [math-ai/StackMathQA](https://huggingface.co/datasets/math-ai/StackMathQA) | 2,000 | Math | CC-BY-SA | | [SakanaAI/AI-CUDA-Engineer-Archive](https://huggingface.co/datasets/SakanaAI/AI-CUDA-Engineer-Archive) | 2,000 | CUDA | CC-BY-4.0 | | [b-mc2/cli-commands-explained](https://huggingface.co/datasets/b-mc2/cli-commands-explained) | 1,500 | Terminal | Open | | [THUDM/LongAlign-10k](https://huggingface.co/datasets/THUDM/LongAlign-10k) | 1,500 | Long Context | Open | | [cais/mmlu](https://huggingface.co/datasets/cais/mmlu) (physics + chemistry) | 195 | Science | MIT | ## REAP Packing Strategy Per the [REAP paper](https://arxiv.org/abs/2510.13999), for models ≥110B parameters: - **No packing** — each sample is its own sequence - **Max sequence length**: 16,384 tokens - Samples longer than 16K tokens are truncated - The REAP observer handles tokenization and batching at runtime - **Batch size**: 8 sequences per forward pass For models <110B parameters, the paper recommends packing multiple samples to fill 2,048-token sequences. ## Format JSONL with fields: ```json { "id": "function_calling_0", "domain": "function_calling", "repo_id": "Salesforce/xlam-function-calling-60k", "subset": "default", "text": "..." } ``` ## Usage with REAP ```bash python scripts/run_qwen35_layerwise_observations_pr17.py \ --dataset-jsonl calibration-v1.jsonl \ --max-tokens 16384 \ --batch-size 8 \ --observation-sequence-chunk-size 1 \ --max-group-batches 20 \ --checkpoint-every-samples 800 ``` ## Models Calibrated With This Dataset - [Qwen3.5-122B-A10B-REAP-20](https://huggingface.co/0xSero/Qwen3.5-122B-A10B-REAP-20) (20% pruned, 97.9% capability retained) - [Qwen3.5-122B-A10B-REAP-30](https://huggingface.co/0xSero/Qwen3.5-122B-A10B-REAP-30) (30% pruned) - [Qwen3.5-122B-A10B-REAP-40](https://huggingface.co/0xSero/Qwen3.5-122B-A10B-REAP-40) (40% pruned) - Qwen3.5-397B-A17B (observations in progress) ## Maintainer - **Author**: [0xSero](https://huggingface.co/0xSero) - **Organization**: Sybil Solutions - **Project**: REAP PR17 ## Citation ```bibtex @article{lu2025reap, title={Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture of Experts}, author={Lu, Xudong and Qiu, Liu and Huang, Jinhao and others}, journal={arXiv preprint arXiv:2510.13999}, year={2025} } ```

提供机构：

0xSero

5,000+

优质数据集

54 个

任务类型

进入经典数据集