five

SFR-M Flow Token Dataset and Embeddings (ISCX VPN-nonVPN)

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/wc48j3hn7w
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains flow-level representations derived from encrypted and non-encrypted network traffic, constructed according to the Symbolic Flow Representation based on the First-M packets (SFR-M) methodology. The data are based on the ISCX VPN-nonVPN traffic collection and include bidirectional flows generated from the first four packets per direction of each flow. For each packet, the first 64 bytes of the payload are extracted and converted into symbolic token sequences using two tokenization granularities (b=2 and b=8). The resulting flow-token datasets are provided in Parquet format, enabling efficient storage and scalable analysis. In addition to the tokenized flow datasets, this collection also includes the corresponding embedding datasets generated from the symbolic tokens using a transformer-based sentence embedding model. Each flow is represented by a fixed-dimensional embedding obtained through packet-level encoding and flow-level aggregation, preserving directional information while avoiding explicit temporal features. The datasets support research on early traffic classification, VPN detection, and application-level traffic analysis, and are intended to facilitate reproducibility, benchmarking, and further studies on flow-based representations for encrypted network traffic.
创建时间:
2026-01-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作