masterpieceexternal/gpt-oss-20b-moe-expert-power-traces-320k-ds16k

Name: masterpieceexternal/gpt-oss-20b-moe-expert-power-traces-320k-ds16k
Creator: masterpieceexternal
Published: 2026-03-06 16:47:19
License: 暂无描述

Hugging Face2026-03-06 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/masterpieceexternal/gpt-oss-20b-moe-expert-power-traces-320k-ds16k

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - audio-classification tags: - side-channel - chipwhisperer - gpu - moe - downsampled size_categories: - 100K<n<1M --- # GPT-OSS-20B MoE Expert Power Traces (Downsampled to 16k) Downsampled variant of the 320k expert-trace capture set. ## Source Raw source dataset (same captures): - 32 experts (`expert_00`..`expert_31`) - 10,000 traces per expert - 320,000 total traces - raw trace length ~195k samples per trace ## Downsampling method Each raw trace was resampled to exactly `16384` samples using linear interpolation (`np.interp`) matching the trainer resampling step. No baseline normalization and no derivative feature are pre-baked; those are applied at train time. ## Format For each class folder `expert_XX`: - `traces.npy`: shape `(10000, 16384)`, dtype `float32` - `trial_ids.npy`: shape `(10000,)`, dtype `int32` (0..9999) - `meta.json`: class-level metadata Top-level files: - `capture_meta.json` - `downsample_summary.json` - `verify_summary.json` (integrity/split check output) - `scripts/downsample_traces_to_16k.py` - `scripts/reproduce_cnn52_from_ds16k.py` - `scripts/verify_ds16k_dataset.py` - `scripts/make_smoke_subset.py` - `scripts/train_expert_classifier_multiclass.py` ## Quick start ### 1) Download ```bash pip install -U "huggingface_hub[cli]" hf download masterpieceexternal/gpt-oss-20b-moe-expert-power-traces-320k-ds16k \ --repo-type dataset \ --local-dir ./gpt-oss-ds16k cd ./gpt-oss-ds16k ``` ### 2) Verify dataset integrity and grouped split ```bash python scripts/verify_ds16k_dataset.py \ --dataset-root . \ --expect-classes 32 \ --expect-traces-per-class 10000 \ --expect-len 16384 \ --val-size 10000 \ --seed 7 ``` Expected split preview: - `train_size = 310016` - `val_size_actual = 9984` - `val_group_count = 312` ### 3) Smoke test that training script runs ```bash python scripts/make_smoke_subset.py \ --dataset-root . \ --out-root /tmp/ds16k_smoke_subset \ --per-class 64 python scripts/reproduce_cnn52_from_ds16k.py \ --dataset-root /tmp/ds16k_smoke_subset \ --run-dir /tmp/repro_cnn52_smoke \ --epochs 1 \ --batch-size 64 \ --lr 1e-3 \ --warmup-epochs 0 \ --seed 7 \ --feature-len 16384 \ --baseline-samples 2000 \ --val-size 512 \ --num-workers 4 \ --expect-min-val-acc 0.0 ``` ## Full reproduction (~52% regime) ```bash python scripts/reproduce_cnn52_from_ds16k.py \ --dataset-root . \ --run-dir ./repro_cnn52 \ --epochs 20 \ --batch-size 32 \ --lr 1e-3 \ --warmup-epochs 2 \ --seed 7 \ --feature-len 16384 \ --baseline-samples 2000 \ --val-size 10000 \ --num-workers 8 \ --expect-min-val-acc 0.50 ``` Hyperparameters mirror the reference run: - preprocess: baseline z-score (first 2000 samples) + `dx` concat (`[x, dx]` => input dim 32768) - split: grouped by `trial_ids`, seed 7, target val size 10,000 (actual 9,984) - model: CNN with BatchNorm (`track_running_stats=True`), dropout 0.2 - optimizer/schedule: Adam (`lr=1e-3`, `weight_decay=0`), cosine decay, warmup 2 epochs - training: 20 epochs, batch size 32 ## Notes - Controlled forced-single-expert harness capture, not a full unmodified forward pass. - Validation accuracy has natural run-to-run variance from stochastic training/CUDA behavior.

提供机构：

masterpieceexternal

5,000+

优质数据集

54 个

任务类型

进入经典数据集