five

VPN-nonVPN dataset

收藏
DataCite Commons2020-09-20 更新2025-04-09 收录
下载链接:
https://www.impactcybertrust.org/dataset_view?idDataset=929
下载链接
链接失效反馈
官方服务:
资源简介:
To generate a representative dataset of real-world traffic in ISCX we defined a set of tasks, assuring that our dataset is rich enough in diversity and quantity. We created accounts for users Alice and Bob in order to use services like Skype, Facebook, etc. Below we provide the complete list of different types of traffic and applications considered in our dataset for each traffic type (VoIP, P2P, etc.) We captured a regular session and a session over VPN, therefore we have a total of 14 traffic categories: VOIP, VPN-VOIP, P2P, VPN-P2P, etc. We also give a detailed description of the different types of traffic generated: Browsing: Under this label we have HTTPS traffic generated by users while browsing or performing any task that includes the use of a browser. For instance, when we captured voice-calls using hangouts, even though browsing is not the main activity, we captured several browsing flows. Email: The traffic samples generated using a Thunderbird client, and Alice and Bob Gmail accounts. The clients were configured to deliver mail through SMTP/S, and receive it using POP3/SSL in one client and IMAP/SSL in the other. Chat: The chat label identifies instant-messaging applications. Under this label we have Facebook and Hangouts via web browsers, Skype, and IAM and ICQ using an application called pidgin [14]. Streaming: The streaming label identifies multimedia applications that require a continuous and steady stream of data. We captured traffic from Youtube (HTML5 and flash versions) and Vimeo services using Chrome and Firefox. File Transfer: This label identifies traffic applications whose main purpose is to send or receive files and documents. For our dataset we captured Skype file transfers, FTP over SSH (SFTP) and FTP over SSL (FTPS) traffic sessions. VoIP: The Voice over IP label groups all traffic generated by voice applications. Within this label we captured voice calls using Facebook, Hangouts and Skype. TraP2P: This label is used to identify file-sharing protocols like Bittorrent. To generate this traffic we downloaded different .torrent files from a public a repository and captured traffic sessions using the uTorrent and Transmission applications. The traffic was captured using Wireshark and tcpdump, generating a total amount of 28GB of data. For the VPN, we used an external VPN service provider and connected to it using OpenVPN (UDP mode). To generate SFTP and FTPS traffic we also used an external service provider and Filezilla as a client. To facilitate the labeling process, when capturing the traffic all unnecessary services and applications were closed. (The only application executed was the objective of the capture, e.g., Skype voice-call, SFTP file transfer, etc.) We used a filter to capture only the packets with source or destination IP, the address of the local client (Alice or Bob). The full research paper outlining the details of the dataset and its underlying principles: Gerard Drapper Gil, Arash Habibi Lashkari, Mohammad Mamun, Ali A. Ghorbani, "Characterization of Encrypted and VPN Traffic Using Time-Related Features", In Proceedings of the 2nd International Conference on Information Systems Security and Privacy(ICISSP 2016) , pages 407-414, Rome, Italy. ISCXFlowMeter has been written in Java for reading the pcap files and create the csv file based on selected features. The UNB ISCX Network Traffic (VPN-nonVPN) dataset consists of labeled network traffic, including full packet in pcap format and csv (flows generated by ISCXFlowMeter) also are publicly available for researchers. For more information contact cic@unb.ca. The UNB ISCX Network Traffic Dataset content Traffic: Content Web Browsing: Firefox and Chrome Email: SMPTS, POP3S and IMAPS Chat: ICQ, AIM, Skype, Facebook and Hangouts Streaming: Vimeo and Youtube File Transfer: Skype, FTPS and SFTP using Filezilla and an external service VoIP: Facebook, Skype and Hangouts voice calls (1h duration) P2P: uTorrent and Transmission (Bittorrent) ; cic@unb.ca.

为构建ISCX平台下真实世界流量的代表性数据集,我们设计了一系列实验任务,以确保本数据集在样本多样性与规模上均具备充足的丰富度。我们为用户Alice和Bob创建了实验账号,以便使用Skype、Facebook等各类网络服务。下文将列出本数据集中涵盖的全部流量类别与对应应用程序,涵盖VoIP、P2P等类型。 我们同时捕获了常规网络会话与虚拟专用网络(Virtual Private Network,简称VPN)下的会话,最终得到总计14类流量样本:VOIP、VPN-VOIP、P2P、VPN-P2P等。此外,我们还将对本数据集生成的各类流量进行详细说明: ### 网页浏览流量 该类别涵盖用户在网页浏览或执行任何涉及浏览器的操作时产生的HTTPS流量。例如,当我们使用Hangouts进行语音通话时,即便网页浏览并非本次操作的核心活动,仍会捕获到若干浏览流。 ### 电子邮件流量 该类流量样本通过Thunderbird邮件客户端,以及Alice与Bob的Gmail账号生成。两台客户端分别配置为通过SMTP/S协议发送邮件,其中一台使用POP3/SSL协议接收邮件,另一台则通过IMAP/SSL协议接收邮件。 ### 即时通讯流量 该标签用于标识即时通讯应用产生的流量,涵盖通过网页浏览器访问的Facebook与Hangouts、Skype,以及使用Pidgin应用程序的IAM与ICQ[14]。 ### 流媒体流量 该标签用于标识需要持续稳定数据流的多媒体应用。我们通过Chrome与Firefox浏览器,捕获了Youtube(HTML5与Flash版本)与Vimeo平台产生的流量。 ### 文件传输流量 该标签用于标识以收发文件与文档为核心功能的应用流量。本数据集捕获了Skype文件传输、SSH封装FTP(SFTP)以及SSL封装FTP(FTPS)的流量会话。 ### IP语音(Voice over IP,VoIP)流量 该标签涵盖所有语音应用产生的流量,我们捕获了通过Facebook、Hangouts与Skype发起的语音通话流量。 ### TraP2P流量 该标签用于标识BitTorrent等文件共享协议。为生成该类流量样本,我们从公共仓库下载了若干.torrent文件,并使用uTorrent与Transmission应用程序捕获流量会话。 本次流量捕获使用Wireshark与tcpdump工具完成,总计产生28GB的原始数据。对于VPN场景,我们使用了第三方VPN服务提供商,并通过OpenVPN(UDP模式)完成网络连接。为生成SFTP与FTPS流量,我们同样使用了第三方服务提供商,并以Filezilla作为客户端程序。 为简化流量标注流程,在捕获流量时我们关闭了所有非必要的服务与应用程序(仅运行本次捕获的目标应用,例如Skype语音通话、SFTP文件传输等)。我们使用流量过滤器,仅捕获源或目的IP为本地客户端(Alice或Bob)地址的数据包。 详细阐述本数据集及其构建原理的完整研究论文如下: Gerard Drapper Gil、Arash Habibi Lashkari、Mohammad Mamun、Ali A. Ghorbani,《基于时间相关特征的加密流量与VPN流量特征刻画》,发表于第二届国际信息系统安全与隐私会议(ICISSP 2016),意大利罗马,第407-414页。 ISCXFlowMeter采用Java语言编写,用于读取pcap格式文件,并基于选定的特征生成CSV(逗号分隔值)文件。UNB ISCX网络流量(VPN-nonVPN)数据集包含标注后的网络流量,涵盖pcap格式的完整数据包与ISCXFlowMeter生成的CSV流量流文件,现已对全球研究人员公开。 如需获取更多信息,请联系cic@unb.ca。 UNB ISCX网络流量数据集内容如下: - 网页浏览:Firefox与Chrome - 电子邮件:SMPTS、POP3S与IMAPS - 即时通讯:ICQ、AIM、Skype、Facebook与Hangouts - 流媒体:Vimeo与Youtube - 文件传输:使用Filezilla与第三方服务的Skype、FTPS与SFTP - IP语音:Facebook、Skype与Hangouts语音通话(时长1小时) - P2P:uTorrent与Transmission(BitTorrent)
提供机构:
IMPACT
创建时间:
2018-10-25
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
VPN-nonVPN数据集是由University of New Brunswick提供的标记网络流量数据集,包含多种网络活动的VPN和非VPN流量记录,格式为pcap和csv,适用于网络流量分析和加密研究。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作