emergentphysicslab/waveguard-benchmarks
收藏Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/emergentphysicslab/waveguard-benchmarks
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- tabular-classification
tags:
- anomaly-detection
- time-series
- time-series-classification
- server-monitoring
- cybersecurity
- benchmark
- physics
- waveguard
- zero-training
- iot
- financial-data
pretty_name: WaveGuard Anomaly Detection Benchmarks
size_categories:
- 1K<n<10K
---
# WaveGuard Anomaly Detection Benchmarks
Curated benchmark datasets for evaluating time-series and tabular anomaly detection models.
Each dataset includes labeled training (normal) and test (mixed normal + anomalous) splits.
## Datasets
### 1. Server Metrics (`server_metrics/`)
Simulated server health metrics with injected failure events.
- **Features**: cpu, memory, disk_io, network, errors (5 numeric)
- **Training**: 500 normal samples
- **Test**: 100 samples (15 anomalous)
- **Anomaly types**: CPU spike, memory leak, disk saturation, network flood
### 2. Crypto Price Anomalies (`crypto_prices/`)
Real cryptocurrency OHLCV data (BTC, ETH, SOL) from 2021-2026 with labeled flash crashes and pump events.
- **Features**: open, high, low, close, volume (5 numeric per coin)
- **Training**: 1200 normal daily candles per coin
- **Test**: 600 candles per coin (labeled anomalies at known events)
- **Source**: Yahoo Finance via yfinance
### 3. Synthetic Time Series (`synthetic_timeseries/`)
Controlled synthetic signals with known anomaly injection points.
- **Patterns**: sinusoidal, trend, seasonal, random walk
- **Anomaly types**: point (spike), contextual (subtle shift), collective (regime change)
- **Training**: 200 clean windows per pattern
- **Test**: 50 windows per pattern (10 anomalous each)
## Format
Each dataset is provided as Parquet files:
```
dataset_name/
train.parquet # Normal samples only
test.parquet # Mixed normal + anomalous
metadata.json # Feature descriptions, anomaly counts, creation params
```
## Usage
```python
from datasets import load_dataset
ds = load_dataset("gpartin/waveguard-benchmarks", "server_metrics")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
```
## Evaluation Protocol
1. Train/fit your detector on `train.parquet` only
2. Score each row in `test.parquet`
3. Report: Precision, Recall, F1, AUC-ROC, Average Latency
4. Compare against WaveGuard baseline in the model card
## Citation
```bibtex
@dataset{waveguard_benchmarks2025,
title={WaveGuard Anomaly Detection Benchmarks},
author={Partin, Greg},
year={2025},
url={https://huggingface.co/datasets/gpartin/waveguard-benchmarks}
}
```
提供机构:
emergentphysicslab



