johnahn/mdk-mining-controller-data
收藏Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/johnahn/mdk-mining-controller-data
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
pretty_name: MDK Mining Controller — Synthetic Telemetry & Features
tags:
- bitcoin-mining
- predictive-maintenance
- time-series
- synthetic-data
size_categories:
- 1M<n<10M
---
# MDK Mining Controller — Data
Companion dataset for the [`mdk-mining-controller`](https://github.com/john-yo-ahn/mdk-mining-controller) prototype (3-week Tether MDK assignment).
## What's here
| File | Size | What it is |
|---|---|---|
| `raw/mining_telemetry.parquet` | 150 MB | 5.2 M rows of 1-minute telemetry for 30 ASIC miners across 120 days. 21 columns: hashrate, power, voltage, frequency, temperature, ambient temp, operating mode, is_online, hardware_model_id, miner_id, timestamp, plus failure-scenario metadata. Output of `src/synthetic/generator.py`. |
| `raw/mdk.duckdb` | 201 MB | Same telemetry as the parquet, loaded into a DuckDB database. Used by the batch pipeline for fast columnar queries. Idempotent re-derivation from the parquet takes ~30 s. |
| `processed/features.v3.parquet` | 3.6 GB | 5.2 M rows × 175 features — the fully engineered feature matrix used to train the XGBoost + LSTM-AE models. Includes the TE KPI and its rolling / trend / correlation variants. Output of `src/pipeline/features.py:build_feature_matrix`. Rebuilding from raw costs ~25 min. |
All files are **reproducible from the repo's synthetic generator** — nothing here is real mining data. The dataset exists to let reviewers skip the 40-minute rebuild step.
## Usage
Clone the code repo first:
```bash
git clone https://github.com/john-yo-ahn/mdk-mining-controller
cd mdk-mining-controller
uv sync
```
Then download this dataset into the expected layout:
```bash
uv run python -c "
from huggingface_hub import snapshot_download
snapshot_download(
'johnahn/mdk-mining-controller-data',
repo_type='dataset',
local_dir='data',
)
"
```
Now the repo has the full `data/raw/` + `data/processed/` tree. Run:
```bash
uv run mdk check # 13/13 pipeline invariants, ~11 min
uv run mdk validate # 4 end-to-end tests, ~9 min
uv run mdk # live Textual dashboard, loads real models
```
## Provenance & seeding
All artifacts were generated deterministically with `seed=42` across the generator, split, and model training. Reruns produce byte-identical metrics to the sidecar metadata in the code repo under `data/models/*.metadata.json`. The `mdk check` harness explicitly verifies this on every invocation.
## Schema
### `raw/mining_telemetry.parquet` (21 columns)
`miner_id`, `timestamp`, `hashrate_th`, `power_w`, `voltage_v`, `frequency_mhz`, `temperature_c`, `ambient_temperature_c`, `operating_mode`, `is_online`, `hardware_model_id`, `hardware_model`, `hash_board_serial`, `scenario_name`, `scenario_onset_step`, `scenario_duration`, `is_pre_failure`, `fan_rpm`, `error_count`, `voltage_sag`, `hashrate_error`.
### `processed/features.v3.parquet` (175 columns)
Derived from the raw 21 + intermediate 34 columns. Categories:
- **Ratios** (~10): `efficiency_jth`, `temp_delta_c`, `power_per_ghz`, `voltage_deviation`, `hashrate_realization`, `te_base`, `te_adjusted`, `te_health`, …
- **Rolling statistics** (~80): `{metric}_roll_{60|360|10080}m_{mean|std|min|max}` for all 10 base signals
- **Trend features** (~20): linear-regression slopes and rate-of-change over 60/360-minute windows
- **Cross-signal correlations** (~15): voltage-temperature, power-hashrate, TE-voltage, etc. rolling correlations
- **Diurnal features** (~6): hour-of-day, day-of-week, sine/cosine encodings
- **Cross-miner features** (~10): container-level means and deviations from container baseline
- **Labels**: `is_pre_failure` (binary target), `failure_type` (multi-class for analysis)
Full schema enumerable via `build_feature_matrix` in `src/pipeline/features.py` in the code repo.
## License
MIT. Same as the code repo.
## Citation
Not peer-reviewed work — a prototype built against Tether's MDK assignment spec. If you reference it, a link back to both repos is appreciated:
- Code: https://github.com/john-yo-ahn/mdk-mining-controller
- Data: https://huggingface.co/datasets/johnahn/mdk-mining-controller-data
提供机构:
johnahn



