masterpieceexternal/gpt-oss-20b-moe-expert-power-traces-320k-ds16k
收藏Hugging Face2026-03-06 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/masterpieceexternal/gpt-oss-20b-moe-expert-power-traces-320k-ds16k
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- audio-classification
tags:
- side-channel
- chipwhisperer
- gpu
- moe
- downsampled
size_categories:
- 100K<n<1M
---
# GPT-OSS-20B MoE Expert Power Traces (Downsampled to 16k)
Downsampled variant of the 320k expert-trace capture set.
## Source
Raw source dataset (same captures):
- 32 experts (`expert_00`..`expert_31`)
- 10,000 traces per expert
- 320,000 total traces
- raw trace length ~195k samples per trace
## Downsampling method
Each raw trace was resampled to exactly `16384` samples using linear interpolation (`np.interp`) matching the trainer resampling step.
No baseline normalization and no derivative feature are pre-baked; those are applied at train time.
## Format
For each class folder `expert_XX`:
- `traces.npy`: shape `(10000, 16384)`, dtype `float32`
- `trial_ids.npy`: shape `(10000,)`, dtype `int32` (0..9999)
- `meta.json`: class-level metadata
Top-level files:
- `capture_meta.json`
- `downsample_summary.json`
- `verify_summary.json` (integrity/split check output)
- `scripts/downsample_traces_to_16k.py`
- `scripts/reproduce_cnn52_from_ds16k.py`
- `scripts/verify_ds16k_dataset.py`
- `scripts/make_smoke_subset.py`
- `scripts/train_expert_classifier_multiclass.py`
## Quick start
### 1) Download
```bash
pip install -U "huggingface_hub[cli]"
hf download masterpieceexternal/gpt-oss-20b-moe-expert-power-traces-320k-ds16k \
--repo-type dataset \
--local-dir ./gpt-oss-ds16k
cd ./gpt-oss-ds16k
```
### 2) Verify dataset integrity and grouped split
```bash
python scripts/verify_ds16k_dataset.py \
--dataset-root . \
--expect-classes 32 \
--expect-traces-per-class 10000 \
--expect-len 16384 \
--val-size 10000 \
--seed 7
```
Expected split preview:
- `train_size = 310016`
- `val_size_actual = 9984`
- `val_group_count = 312`
### 3) Smoke test that training script runs
```bash
python scripts/make_smoke_subset.py \
--dataset-root . \
--out-root /tmp/ds16k_smoke_subset \
--per-class 64
python scripts/reproduce_cnn52_from_ds16k.py \
--dataset-root /tmp/ds16k_smoke_subset \
--run-dir /tmp/repro_cnn52_smoke \
--epochs 1 \
--batch-size 64 \
--lr 1e-3 \
--warmup-epochs 0 \
--seed 7 \
--feature-len 16384 \
--baseline-samples 2000 \
--val-size 512 \
--num-workers 4 \
--expect-min-val-acc 0.0
```
## Full reproduction (~52% regime)
```bash
python scripts/reproduce_cnn52_from_ds16k.py \
--dataset-root . \
--run-dir ./repro_cnn52 \
--epochs 20 \
--batch-size 32 \
--lr 1e-3 \
--warmup-epochs 2 \
--seed 7 \
--feature-len 16384 \
--baseline-samples 2000 \
--val-size 10000 \
--num-workers 8 \
--expect-min-val-acc 0.50
```
Hyperparameters mirror the reference run:
- preprocess: baseline z-score (first 2000 samples) + `dx` concat (`[x, dx]` => input dim 32768)
- split: grouped by `trial_ids`, seed 7, target val size 10,000 (actual 9,984)
- model: CNN with BatchNorm (`track_running_stats=True`), dropout 0.2
- optimizer/schedule: Adam (`lr=1e-3`, `weight_decay=0`), cosine decay, warmup 2 epochs
- training: 20 epochs, batch size 32
## Notes
- Controlled forced-single-expert harness capture, not a full unmodified forward pass.
- Validation accuracy has natural run-to-run variance from stochastic training/CUDA behavior.
提供机构:
masterpieceexternal



