Host network traffic time series 2019/01
收藏Mendeley Data2024-03-27 更新2024-06-27 收录
下载链接:
https://zenodo.org/record/2669079
下载链接
链接失效反馈官方服务:
资源简介:
General info Dataset was collected over one month period in January 2019. The observation points for the collection of IP flows were located at the borders of the university campus network. The campus university network has /16 CIDR IPv4 network range at disposal and contains various network segments from segments connecting dormitories, over server segments, to a segment containing working stations of university administrative workers. The size of the raw IP flows used to create the dataset was over 860GB. A host in our dataset is identified by its source IPv4 address. Variables The dataset contains the following variables: Aggregations - created from five-minute total volumes aggregated over one-hour disjoint windows using mean/max/min aggregation functions # of flows (FL) - number of flows for a given source IP # of packets (PKT) - number of packets for a given source IP # of bytes (BYT) - number of packets for a given source IP flow duration (DUR) - average flow duration in seconds Distinct Counts - count of distinct values for each variable in five-minute window aggregated over one-hour disjoint windows using mean/max/min aggregation functions # of peers (PEER) - number of distinct communication peers for a given source IP # of ports (PORTS) - number of distinct destination ports for a given source IP # of protocols (PROTO) - number of distinct communication protocols for a given source IP # of AS numbers (AS) - number of distinct destination AS numbers for a given source IP # of countries (CTRY) - number of distinct destination countries for a given source IP Labels Range (RNG) - a network range a host belongs to (anonymized) Unit (UNT) - an administrative unit owning the network range Sub-unit (SUB-UNT) - a sub-unit of the unit Dataset format The dataset is in comma-separated values (CSV) format. Header - multilevel, first 3 lines 1 level - aggregation type {mean|min|max} 2 level - variable {see above} 3 level - hour of a day {00,01,02,03,...,22,23} Lablels - last 4 columns Dataset size rows: 65536 host records + 3 headers columns: 648 variables + 4 labels
基本信息
本数据集采集于2019年1月,采集周期为一个月。IP流的采集观测点部署于大学校园网络的边界处。该校园网络拥有可支配的/16 CIDR IPv4地址段,包含多种网络网段:涵盖学生宿舍连接网段、服务器网段,以及大学行政工作人员工作站所在网段。用于构建本数据集的原始IP流数据总量超过860GB。本数据集中的主机通过其源IPv4地址进行标识。
变量说明
本数据集包含以下变量:
聚合特征:基于5分钟总流量数据,通过均值、最大值、最小值聚合函数,在互不重叠的1小时窗口内聚合得到
- 流数量(FL):给定源IPv4地址对应的流总数
- 数据包数量(PKT):给定源IPv4地址对应的数据包总数
- 字节数(BYT):给定源IPv4地址对应的总字节数
- 流时长(DUR):平均流持续时长,单位为秒
唯一值计数:针对5分钟窗口内的各变量唯一值数量,通过均值、最大值、最小值聚合函数,在互不重叠的1小时窗口内聚合得到
- 通信对等端数量(PEER):给定源IPv4地址对应的不同通信对等端总数
- 目的端口数量(PORTS):给定源IPv4地址对应的不同目的端口总数
- 通信协议数量(PROTO):给定源IPv4地址对应的不同通信协议总数
- 自治系统编号数量(AS):给定源IPv4地址对应的不同目的自治系统编号总数
- 目的国家数量(CTRY):给定源IPv4地址对应的不同目的国家总数
标签说明
- 网段(RNG):主机所属的网络网段(已匿名化)
- 管理单元(UNT):拥有该网段的行政单元
- 子管理单元(SUB-UNT):该管理单元的下级子单元
数据集格式
本数据集采用逗号分隔值(CSV)格式。表头为多层级结构,共3行:
第1层级:聚合类型 {均值|最小值|最大值}
第2层级:变量(详见上文说明)
第3层级:当日小时数 {00、01、02、03……22、23}
标签列:最后4列
数据集规模
行数:65536条主机记录 + 3行表头;列数:648个变量列 + 4个标签列
创建时间:
2023-06-28



