five

Electrical Substations Cybersecurity Dataset|电力系统安全数据集|机器学习数据集

收藏
github2024-10-07 更新2024-10-08 收录
电力系统安全
机器学习
下载链接:
https://github.com/esguti/cybersecurity-datasets
下载链接
链接失效反馈
资源简介:
该数据集用于训练和评估电力变电站网络安全机器学习模型,包含IEC61850和IEC104协议的网络捕获数据。
创建时间:
2024-10-06
原始信息汇总

数据集概述

研究背景

该数据集是研究工作“基于机器学习模型的电力变电站入侵检测系统训练数据集”的一部分,目前正在等待出版批准。数据集旨在训练和评估用于电力变电站网络安全的机器学习模型。

数据格式

  • 数据集文件格式:PCAP 或 PCAPNG
  • 数据来源:IEC61850 或 IEC60870-5-104(也称为 IEC104)

数据处理工具

  • tshark:用于预处理脚本
  • Sanicap:用于匿名化处理
  • Cicflowmeter:用于特征提取

数据处理流程

  1. 过滤与分割:使用 Wireshark 的 tshark 工具进行过滤,并将大文件分割为 10GB 的单元。
  2. 合并:将分割后的文件合并。
  3. 匿名化:使用 Sanicap 工具对 PCAPNG 文件进行匿名化处理。
  4. 生成 CSV:根据 IEC104 和 IEC61850 协议提取特征,并生成 CSV 文件。

数据集使用

  • 机器学习算法测试:使用 Python 脚本执行机器学习算法来测试数据集。
  • 标签生成:根据文件名中的最后一个“-”符号后的文本生成标签,用于标识攻击类型或无攻击。

环境要求

  • Python 及其相关库
  • GPU(非强制,但推荐)

安装与执行

  • 安装工具:包括 tshark、Sanicap 和 Cicflowmeter。
  • 安装 Python 和库:使用 CONDA 创建虚拟环境,并安装所需的 Python 库。
  • 执行 IDS:激活虚拟环境后,运行 pycaret_ids.py 脚本进行数据集测试。
AI搜集汇总
数据集介绍
main_image_url
构建方式
该数据集的构建基于对电力变电站网络捕获数据的预处理和特征提取。具体而言,数据集包括从IEC61850和IEC60870-5-104协议中提取的网络流量数据,这些数据以PCAP格式存储。通过使用tshark工具进行过滤和分割,将大型PCAP文件分割为10GB的单元,随后进行合并和匿名化处理。特征提取阶段,针对IEC104和IEC61850协议分别使用CICFlowMeter和tshark工具,最终生成包含攻击标签的CSV文件。
特点
该数据集的显著特点在于其针对电力变电站网络安全的专门设计。数据集涵盖了IEC61850和IEC60870-5-104两种协议的网络流量,提供了丰富的特征数据。此外,数据集经过严格的匿名化处理,确保了数据的安全性和隐私保护。标签化的设计使得数据集能够直接用于训练入侵检测系统,特别是基于机器学习模型的检测系统。
使用方法
使用该数据集时,首先需要安装必要的工具和依赖,如tshark、Sanicap和CICFlowMeter。随后,通过执行预处理脚本对PCAP文件进行过滤、分割和合并。特征提取阶段,根据协议类型选择相应的脚本生成CSV文件。最后,利用IDS文件夹中的Python脚本,可以对生成的CSV数据集进行机器学习模型的训练和评估。特别地,设置use_gpu=True参数可以利用GPU加速计算。
背景与挑战
背景概述
电力变电站网络安全数据集(Electrical Substations Cybersecurity Dataset)是针对电力变电站网络入侵检测系统(IDS)训练的机器学习模型而创建的数据集。该数据集的研究工作目前正在等待发表批准,其核心研究问题是如何利用机器学习模型有效识别和防御电力变电站中的网络攻击。该数据集的构建涉及对IEC61850和IEC60870-5-104(即IEC104)协议的网络捕获数据进行预处理和测试,旨在为电力变电站的网络安全提供强有力的技术支持。
当前挑战
该数据集在构建过程中面临多项挑战。首先,数据集的来源是基于PCAP格式的网络捕获数据,这些数据需要经过复杂的预处理步骤,包括过滤、分割、合并和匿名化,以确保数据的质量和安全性。其次,由于电力变电站网络数据的特殊性,特征提取过程需要区分IEC104和IEC61850协议,这增加了数据处理的复杂性。此外,数据集的构建还需要考虑如何有效地标记攻击类型,以便机器学习模型能够准确识别和分类不同的网络攻击。
常用场景
经典使用场景
在电力系统安全领域,Electrical Substations Cybersecurity Dataset 主要用于训练和评估基于机器学习模型的入侵检测系统。该数据集通过处理来自IEC61850或IEC60870-5-104协议的网络捕获文件,提取关键特征,进而用于训练多种机器学习算法,如Pycaret、lazypredict等,以识别和防御电力变电站中的网络攻击。
实际应用
在实际应用中,Electrical Substations Cybersecurity Dataset 被广泛用于电力公司和网络安全企业的入侵检测系统开发。通过使用该数据集训练的模型,能够实时监控电力变电站的网络流量,及时发现并响应潜在的网络威胁,确保电力供应的连续性和稳定性。此外,该数据集还支持政府和研究机构进行网络安全策略的制定和评估。
衍生相关工作
基于Electrical Substations Cybersecurity Dataset,研究者们开发了多种先进的入侵检测系统,并发表了一系列相关论文。例如,有研究利用该数据集训练深度学习模型,显著提升了检测精度;还有工作结合强化学习,实现了自适应的入侵防御策略。这些衍生工作不仅丰富了电力系统网络安全的研究内容,也为实际应用提供了更多可能性。
以上内容由AI搜集并总结生成
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
点击留言
数据主题
具身智能
数据集  4098个
机构  8个
大模型
数据集  439个
机构  10个
无人机
数据集  37个
机构  6个
指令微调
数据集  36个
机构  6个
蛋白质结构
数据集  50个
机构  8个
空间智能
数据集  21个
机构  5个
5,000+
优质数据集
54 个
任务类型
进入经典数据集
热门数据集

LFW

人脸数据集;LFW数据集共有13233张人脸图像,每张图像均给出对应的人名,共有5749人,且绝大部分人仅有一张图片。每张图片的尺寸为250X250,绝大部分为彩色图像,但也存在少许黑白人脸图片。 URL: http://vis-www.cs.umass.edu/lfw/index.html#download

AI_Studio 收录

中国区域交通网络数据集

该数据集包含中国各区域的交通网络信息,包括道路、铁路、航空和水路等多种交通方式的网络结构和连接关系。数据集详细记录了各交通节点的位置、交通线路的类型、长度、容量以及相关的交通流量信息。

data.stats.gov.cn 收录

Canadian Census

**Overview** The data package provides demographics for Canadian population groups according to multiple location categories: Forward Sortation Areas (FSAs), Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs), Federal Electoral Districts (FEDs), Health Regions (HRs) and provinces. **Description** The data are available through the Canadian Census and the National Household Survey (NHS), separated or combined. The main demographic indicators provided for the population groups, stratified not only by location but also for the majority by demographical and socioeconomic characteristics, are population number, females and males, usual residents and private dwellings. The primary use of the data at the Health Region level is for health surveillance and population health research. Federal and provincial departments of health and human resources, social service agencies, and other types of government agencies use the information to monitor, plan, implement and evaluate programs to improve the health of Canadians and the efficiency of health services. Researchers from various fields use the information to conduct research to improve health. Non-profit health organizations and the media use the health region data to raise awareness about health, an issue of concern to all Canadians. The Census population counts for a particular geographic area representing the number of Canadians whose usual place of residence is in that area, regardless of where they happened to be on Census Day. Also included are any Canadians who were staying in that area on Census Day and who had no usual place of residence elsewhere in Canada, as well as those considered to be 'non-permanent residents'. National Household Survey (NHS) provides demographic data for various levels of geography, including provinces and territories, census metropolitan areas/census agglomerations, census divisions, census subdivisions, census tracts, federal electoral districts and health regions. In order to provide a comprehensive overview of an area, this product presents data from both the NHS and the Census. NHS data topics include immigration and ethnocultural diversity; aboriginal peoples; education and labor; mobility and migration; language of work; income and housing. 2011 Census data topics include population and dwelling counts; age and sex; families, households and marital status; structural type of dwelling and collectives; and language. The data are collected for private dwellings occupied by usual residents. A private dwelling is a dwelling in which a person or a group of persons permanently reside. Information for the National Household Survey does not include information for collective dwellings. Collective dwellings are dwellings used for commercial, institutional or communal purposes, such as a hotel, a hospital or a work camp. **Benefits** - Useful for canada public health stakeholders, for public health specialist or specialized public and other interested parties. for health surveillance and population health research. for monitoring, planning, implementation and evaluation of health-related programs. media agencies may use the health regions data to raise awareness about health, an issue of concern to all canadians. giving the addition of longitude and latitude in some of the datasets the data can be useful to transpose the values into geographical representations. the fields descriptions along with the dataset description are useful for the user to quickly understand the data and the dataset. **License Information** The use of John Snow Labs datasets is free for personal and research purposes. For commercial use please subscribe to the [Data Library](https://www.johnsnowlabs.com/marketplace/) on John Snow Labs website. The subscription will allow you to use all John Snow Labs datasets and data packages for commercial purposes. **Included Datasets** - [Canadian Population and Dwelling by FSA 2011](https://www.johnsnowlabs.com/marketplace/canadian-population-and-dwelling-by-fsa-2011) - This Canadian Census dataset covers data on population, total private dwellings and private dwellings occupied by usual residents by forward sortation area (FSA). It is enriched with the percentage of the population or dwellings versus the total amount as well as the geographical area, province, and latitude and longitude. The whole Canada's population is marked as 100, referring to 100% for the percentages. - [Detailed Canadian Population Statistics by CMAs and CAs 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-cmas-and-cas-2011) - This dataset covers the population statistics of Canada by Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs). It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by FED 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-fed-2011) - This dataset covers the population statistics of Canada from 2011 by Federal Electoral District of 2013 Representation Order. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Health Region 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-health-region-2011) - This dataset covers the population statistics of Canada by health region. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Province 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-province-2011) - This dataset covers the population statistics of Canada by provinces and territories. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. **Data Engineering Overview** **We deliver high-quality data** - Each dataset goes through 3 levels of quality review - 2 Manual reviews are done by domain experts - Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints - Data is normalized into one unified type system - All dates, unites, codes, currencies look the same - All null values are normalized to the same value - All dataset and field names are SQL and Hive compliant - Data and Metadata - Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters - Metadata is provided in the open Frictionless Data standard, and its every field is normalized & validated - Data Updates - Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted **Our data is curated and enriched by domain experts** Each dataset is manually curated by our team of doctors, pharmacists, public health & medical billing experts: - Field names, descriptions, and normalized values are chosen by people who actually understand their meaning - Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset - Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations - The data is always kept up to date – even when the source requires manual effort to get updates - Support for data subscribers is provided directly by the domain experts who curated the data sets - Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution. **Need Help?** If you have questions about our products, contact us at [info@johnsnowlabs.com](mailto:info@johnsnowlabs.com).

Databricks 收录

PDT Dataset

PDT数据集是由山东计算机科学中心(国家超级计算济南中心)和齐鲁工业大学(山东省科学院)联合开发的无人机目标检测数据集,专门用于检测树木病虫害。该数据集包含高分辨率和低分辨率两种版本,共计5775张图像,涵盖了健康和受病虫害影响的松树图像。数据集的创建过程包括实地采集、数据预处理和人工标注,旨在为无人机在农业中的精准喷洒提供高精度的目标检测支持。PDT数据集的应用领域主要集中在农业无人机技术,旨在提高无人机在植物保护中的目标识别精度,解决传统检测模型在实际应用中的不足。

arXiv 收录

中国近海台风路径集合数据集(1945-2024)

1945-2024年度,中国近海台风路径数据集,包含每个台风的真实路径信息、台风强度、气压、中心风速、移动速度、移动方向。 数据源为获取温州台风网(http://www.wztf121.com/)的真实观测路径数据,经过处理整合后形成文件,如使用csv文件需使用文本编辑器打开浏览,否则会出现乱码,如要使用excel查看数据,请使用xlsx的格式。

国家海洋科学数据中心 收录