five

FiN-2: Larg-Scale Powerline Communication Dataset (Pt.1)

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/8328112
下载链接
链接失效反馈
官方服务:
资源简介:
# FiN-2 Large-Scale Real-World PLC-Dataset ## About #### FiN-2 dataset in a nutshell: FiN-2 is the first large-scale real-world dataset on data collected in a powerline communication infrastructure. Since the electricity grid is inherently a graph, our dataset could be interpreted as a graph dataset. Therefore, we use the word node to describe points (cable distribution cabinets) of measurement within the low-voltage electricity grid and the word edge to describe connections (cables) in between them. However, since these are PLC connections, an edge does not necessarily have to correspond to a real cable; more on this in our paper. FiN-2 shows measurements that relate to the nodes (voltage, total harmonic distortion) as well as to the edges (signal-to-noise ratio spectrum, tonemap). In total, FiN-2 is distributed across three different sites with a total of 1,930,762,116 node measurements each for the individual features and 638,394,025 edge measurements each for all 917 PLC channels. All data was collected over a 25-month period from mid-2020 to the end of 2022. We propose this dataset to foster research in the domain of grid automation and smart grid. Therefore, we provide different example use cases in asset management, grid state visualization, forecasting, predictive maintenance, and novelty detection. For more decent information on this dataset, please see our [paper](https://arxiv.org/abs/2209.12693). * * * ## Content FiN-2 dataset splits up into two compressed `csv-Files`: *nodes.csv* and *edges.csv*. All files are provided as a compressed ZIP file and are divided into four parts. The first part can be found in this repo, while the remaining parts can be found in the following: - https://zenodo.org/record/8328105 - https://zenodo.org/record/8328108 - https://zenodo.org/record/8328111   ### Node data |    id    | ts |    v1 |    v2 |    v3    | thd1 |    thd2 |    thd3 |    phase_angle1 |    phase_angle2 |    phase_angle3 |    temp | |----|----|----|----|----|----|----|----|----|----|----|----|----|----| |112|1605530460|236.5|236.4|236.0|2.9|2.5|2.4|120.0|119.8|120.0|35.3| |112|1605530520|236.9|236.6|236.6|3.1|2.7|2.5|120.1|119.8|120.0|35.3| |112|1605530580|236.2|236.4|236.0|3.1|2.7|2.5|120.0|120.0|119.9|35.5| - id / ts: Unique identifier of the node that is measured and timestemp of the measurement - v1/v2/v3: Voltage measurements of all three phases  - thd1/thd2/thd3: Total harmonic distortion of all three phases  - phase_angle1/2/3: Phase angle of all three phases  - temp: Temperature in-circuit of the sensor inside a cable distribution unit (in °C) ### Edge data |    src    | dst |    ts |    snr0 |    snr1    | snr2 |    ... |    snr916 | |----|----|----|----|----|----|----|----| |62|94|1605528900|70|72|45|...|-53| |62|32|1605529800|16|24|13|...|-51| |17|94|1605530700|37|25|24|...|-55| - src & dst & ts: Unique identifier of the source and target nodes where the spectrum is measured and time of measurement - snr0/snr1/.../snr916: 917 SNR measurements in tenths of a decibel (e.g. 50 --> 5dB). ### Metadata Metadata that is provided along with the data covers: - Number of cable joints - Cable properties (length, type, number of sections) - Relative position of the nodes (location, zero-centered gps) - Adjacent PV or wallbox installations - Year of installation w.r.t. the nodes and cables Since the electricity grid is part of the critical infrastructure, it is not possible to provide exact GPS locations. * * * ## Usage Simple data access using pandas: ``` import pandas as pd nodes_file = "nodes.csv.gz" # /path/to/nodes.csv.gz edges_file = "edges.csv.gz" # /path/to/edges.csv.gz # read the first 10 rows data = pd.read_csv(nodes_file, nrows=10, compression='gzip') # read the row number 5 to 15 data = pd.read_csv(nodes_file, nrows=10, skiprows=[i for i in range(1,6)], compression='gzip') # ... same for the edges ``` Compressed csv-data format was used to make sharing as easy as possible, however it comes with significant drawbacks for machine learning. Due to the inherent graph structure, a single snapshot of the whole graph consists of a set of node and edge measurements. But due to timeouts, noise and other disturbances, nodes sometimes fail in collecting the data, wherefore the number of measurements for a specific timestamp differs. This, plus the high sparsity of the graph, leads to a high inefficiency when using the csv-format for an ML training. To utilize the data in an ML pipeline, we recommend other data formats like [datadings](https://datadings.readthedocs.io/en/latest/) or specialized database solutions like [VictoriaMetrics](https://victoriametrics.com/). ### Example use case (voltage forecasting) Forecasting of the voltage is one potential use cases. The Jupyter notebook provided in the repository gives an overview of how the dataset can be loaded, preprocessed and used for ML training. Thereby, a MinMax scaling was used as simple preprocessing and a PyTorch dataset class was created to handle the data. Furthermore, a vanilla autoencoder is utilized to process and forecast the voltage into the future.
创建时间:
2024-07-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作