下载链接：

https://zenodo.org/record/7956303

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset has been meticulously prepared and utilized as a validation set during the evaluation phase of "Meta IDS" to asses the performance of various machine learning models. It is now made available for interested users and researchers who seek a reliable and diverse dataset for training and testing their own custom models. The validation dataset comprises a comprehensive collection of labeled entries, that determines whether the packet type is "malicious" or "benign." It covers complex design patterns that are commonly encountered in real-world applications. The dataset is designed to be representative, encompassing edge and fog layers that are in contact with cloud layer, thereby enabling thorough testing and evaluation of different models. Each sample in the dataset is labeled with the corresponding ground truth, providing a reliable reference for model performance evaluation. To ensure convenient distribution and storage, the dataset has been broken down into three separate batches, each containing a portion of the dataset. This allows for convenient downloading and management of the dataset. The three batches are provided as individual compressed files. In order to extract the data, follow the following instructions: Download and install bzip2 (if not already installed) from the official website or your package manager. Place the compressed dataset file in a directory of your choice. Open a terminal or command prompt and navigate to the directory where the compressed dataset file is located. Execute the following command to uncompress the dataset: bzip2 -d filename.bz2 Replace "filename.bz2" with the actual name of the compressed dataset file. Once uncompressed, you will have access to the dataset in its original format for further exploration, analysis, and model training etc. The total storage required for extraction is approximately 800 GB in total, with the first batch requiring approximately 302 GB, the second batch requiring approximately 203 GB, and the third batch requiring approximately 297 GB of data storage. The first batch contains 1,049,527,992 entries, where as the second batch contains 711,043,331 entries, and for the third and last batch we have 1,029,303,062 entries. The following table provides the feature names along with their explanation and example value once the dataset is extracted. Feature Description Example Value ip.src Source IP address in the packet a05d4ecc38da01406c9635ec694917e969622160e728495e3169f62822444e17 ip.dst Destination IP address in the packet a52db0d87623d8a25d0db324d74f0900deb5ca4ec8ad9f346114db134e040ec5 frame.time_epoch Epoch time of the frame 1676165569.930869 arp.hw.type Hardware type 1 arp.hw.size Hardware size 6 arp.proto.size Protocol size 4 arp.opcode Opcode 2 data.len Length 2713 eth.dst.lg Destination LG bit 1 eth.dst.ig Destination IG bit 1 eth.src.lg Source LG bit 1 eth.src.ig Source IG bit 1 frame.offset_shift Time shift for this packet 0 frame.len frame length on the wire 1208 frame.cap_len Frame length stored into the capture file 215 frame.marked Frame is marked 0 frame.ignored Frame is ignored 0 frame.encap_type Encapsulation type 1 gre Generic Routing Encapsulation 'Generic Routing Encapsulation (IP)’ ip.version Version 6 ip.hdr_len Header length 24 ip.dsfield.dscp Differentiated Services Codepoint 56 ip.dsfield.ecn Explicit Congestion Notification 2 ip.len Total length 614 ip.flags.rb Reserved bit 0 ip.flags.df Don't fragment 1 ip.flags.mf More fragments 0 ip.frag_offset Fragment offset 0 ip.ttl Time to live 31 ip.proto Protocol 47 ip.checksum.status Header checksum status 2 tcp.srcport TCP source port 53425 tcp.flags Flags 0x00000098 tcp.flags.ns Nonce 0 tcp.flags.cwr Congestion Window Reduced (CWR) 1 udp.srcport UDP source port 64413 udp.dstport UDP destination port 54087 udp.stream Stream index 1345 udp.length Length 225 udp.checksum.status Checksum status 3 packet_type Type of the packet which is either "benign" or "malicious" 0 Furthermore, in compliance with the GDPR and to ensure the privacy of individuals, all IP addresses present in the dataset have been anonymized through hashing. This anonymization process helps protect the identity of individuals while preserving the integrity and utility of the dataset for research and model development purposes. Please note that while the dataset provides valuable insights and a solid foundation for machine learning tasks, it is not a substitute for extensive real-world data collection. However, it serves as a valuable resource for researchers, practitioners, and enthusiasts in the machine learning community, offering a compliant and anonymized dataset for developing and validating custom models in a specific problem domain. By leveraging the validation dataset for machine learning model evaluation and custom model training, users can accelerate their research and development efforts, building upon the knowledge gained from my thesis while contributing to the advancement of the field.

应用场景：

Federated Learning for Distributed Intrusion Detection Systems in Public Networks - Validation Dataset