five

johnahn/mdk-mining-controller-data

收藏
Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/johnahn/mdk-mining-controller-data
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit pretty_name: MDK Mining Controller — Synthetic Telemetry & Features tags: - bitcoin-mining - predictive-maintenance - time-series - synthetic-data size_categories: - 1M<n<10M --- # MDK Mining Controller — Data Companion dataset for the [`mdk-mining-controller`](https://github.com/john-yo-ahn/mdk-mining-controller) prototype (3-week Tether MDK assignment). ## What's here | File | Size | What it is | |---|---|---| | `raw/mining_telemetry.parquet` | 150 MB | 5.2 M rows of 1-minute telemetry for 30 ASIC miners across 120 days. 21 columns: hashrate, power, voltage, frequency, temperature, ambient temp, operating mode, is_online, hardware_model_id, miner_id, timestamp, plus failure-scenario metadata. Output of `src/synthetic/generator.py`. | | `raw/mdk.duckdb` | 201 MB | Same telemetry as the parquet, loaded into a DuckDB database. Used by the batch pipeline for fast columnar queries. Idempotent re-derivation from the parquet takes ~30 s. | | `processed/features.v3.parquet` | 3.6 GB | 5.2 M rows × 175 features — the fully engineered feature matrix used to train the XGBoost + LSTM-AE models. Includes the TE KPI and its rolling / trend / correlation variants. Output of `src/pipeline/features.py:build_feature_matrix`. Rebuilding from raw costs ~25 min. | All files are **reproducible from the repo's synthetic generator** — nothing here is real mining data. The dataset exists to let reviewers skip the 40-minute rebuild step. ## Usage Clone the code repo first: ```bash git clone https://github.com/john-yo-ahn/mdk-mining-controller cd mdk-mining-controller uv sync ``` Then download this dataset into the expected layout: ```bash uv run python -c " from huggingface_hub import snapshot_download snapshot_download( 'johnahn/mdk-mining-controller-data', repo_type='dataset', local_dir='data', ) " ``` Now the repo has the full `data/raw/` + `data/processed/` tree. Run: ```bash uv run mdk check # 13/13 pipeline invariants, ~11 min uv run mdk validate # 4 end-to-end tests, ~9 min uv run mdk # live Textual dashboard, loads real models ``` ## Provenance & seeding All artifacts were generated deterministically with `seed=42` across the generator, split, and model training. Reruns produce byte-identical metrics to the sidecar metadata in the code repo under `data/models/*.metadata.json`. The `mdk check` harness explicitly verifies this on every invocation. ## Schema ### `raw/mining_telemetry.parquet` (21 columns) `miner_id`, `timestamp`, `hashrate_th`, `power_w`, `voltage_v`, `frequency_mhz`, `temperature_c`, `ambient_temperature_c`, `operating_mode`, `is_online`, `hardware_model_id`, `hardware_model`, `hash_board_serial`, `scenario_name`, `scenario_onset_step`, `scenario_duration`, `is_pre_failure`, `fan_rpm`, `error_count`, `voltage_sag`, `hashrate_error`. ### `processed/features.v3.parquet` (175 columns) Derived from the raw 21 + intermediate 34 columns. Categories: - **Ratios** (~10): `efficiency_jth`, `temp_delta_c`, `power_per_ghz`, `voltage_deviation`, `hashrate_realization`, `te_base`, `te_adjusted`, `te_health`, … - **Rolling statistics** (~80): `{metric}_roll_{60|360|10080}m_{mean|std|min|max}` for all 10 base signals - **Trend features** (~20): linear-regression slopes and rate-of-change over 60/360-minute windows - **Cross-signal correlations** (~15): voltage-temperature, power-hashrate, TE-voltage, etc. rolling correlations - **Diurnal features** (~6): hour-of-day, day-of-week, sine/cosine encodings - **Cross-miner features** (~10): container-level means and deviations from container baseline - **Labels**: `is_pre_failure` (binary target), `failure_type` (multi-class for analysis) Full schema enumerable via `build_feature_matrix` in `src/pipeline/features.py` in the code repo. ## License MIT. Same as the code repo. ## Citation Not peer-reviewed work — a prototype built against Tether's MDK assignment spec. If you reference it, a link back to both repos is appreciated: - Code: https://github.com/john-yo-ahn/mdk-mining-controller - Data: https://huggingface.co/datasets/johnahn/mdk-mining-controller-data
提供机构:
johnahn
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作