five

heuristic-eval-labs/HESDL-Node-Telemetry-Blobs-v1

收藏
Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/heuristic-eval-labs/HESDL-Node-Telemetry-Blobs-v1
下载链接
链接失效反馈
官方服务:
资源简介:
--- viewer: false license: openrail task_categories: - time-series-forecasting - tabular-classification tags: - synthetic - telemetry - heuristic-evaluation - distributed-systems - infrastructure pretty_name: HESDL Synthetic Node Telemetry size_categories: - 100K<n<1M language: - en --- # Dataset Card for HESDL Synthetic Node Telemetry ## Dataset Description - **Homepage:** https://heuristic-labs.org/research/telemetry - **Repository:** Heuristic Evaluation and Synthetic Data Labs (HESDL) - **Paper:** N/A (Internal Whitepaper - HESDL-TR-2026-04) - **Point of Contact:** sysadmin@heuristic-labs.org ### Dataset Summary This dataset contains aggregated, anonymized, and synthetic telemetry payloads generated across simulated distributed nodes within the HESDL infrastructure. The primary objective of this corpus is to provide a baseline for evaluating heuristic anomaly detection algorithms in high-throughput, unstructured blob-storage environments. Due to the nature of the simulated stress tests, the dataset includes large binary objects (blobs), fragmented log sequences, and unstructured payload dumps to accurately reflect real-world network degradation and storage saturation scenarios. ### Supported Tasks and Leaderboards - `anomaly-detection`: The dataset can be used to train models to identify corrupted blobs or irregular telemetry spikes. - `state-reconstruction`: Testing automated recovery protocols using fragmented data instances. ### Languages The underlying structured metadata is in English (`en`). Binary payloads and blob objects are intentionally obfuscated or machine-encoded and do not represent natural language. ## Dataset Structure ### Data Instances A typical instance in this dataset represents a single node's state dump at a specific timestamp. ```json { "node_id": "hesdl-worker-cluster-7-node-402", "timestamp": 1773243100, "payload_type": "opaque_blob", "blob_reference": "data/part-0042-8a9b.bin", "checksum_sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", "heuristic_flag": 0 } Data Fields * node_id: A string identifier for the synthetic worker node. * timestamp: UNIX epoch timestamp of the telemetry dump. * payload_type: Categorical descriptor of the file format (mostly raw or opaque blobs). * blob_reference: Pointer to the large unstructured files stored within the repository. * checksum_sha256: Hash for data integrity validation. * heuristic_flag: Integer (0 or 1) indicating if the generation cycle was flagged for induced degradation. Data Splits The data is not split into traditional train/test sets, as it is intended for unsupervised heuristic evaluation. It is partitioned chronologically by generation batch. Dataset Creation Curation Rationale Standard telemetry datasets often fail to capture the chaotic nature of binary degradation in distributed storage. HESDL generated this corpus to fill the gap, providing raw, unfiltered, and heavy payload files that mimic catastrophic system states. Source Data All data is strictly synthetic or heavily obfuscated. No real user data, personally identifiable information (PII), or production network traffic is included in this repository. Considerations for Using the Data Social Impact of Dataset This dataset is strictly infrastructural and mathematical. It has no direct social impact, as it pertains entirely to the field of systems architecture and synthetic data generation. Limitations The payloads are unstructured and may require custom parsers depending on the evaluation framework used. Some binary files are intentionally corrupted to simulate hardware failure. Disclaimers This repository is maintained for internal benchmarking by the Heuristic Evaluation and Synthetic Data Labs. External usage is permitted under the specified license, but no technical support or schema stability guarantees are provided.
提供机构:
heuristic-eval-labs
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作