HTTP DATASET CSIC 2010

DataCite Commons2022-10-05 更新2024-07-13 收录

下载链接：

https://www.impactcybertrust.org/dataset_view?idDataset=940

下载链接

链接失效反馈

资源简介：

The HTTP dataset CSIC 2010 contains the generated traffic targeted to an eCommerce web application developed at our department. In this web application, users can buy items using a shopping cart and register by providing some personal information. As it is a web application in Spanish, the data set contains some Latin characters. The dataset is generated automatically and contains 36,000 normal requests and more than 25,000 anomalous requests. The HTTP requests are labeled as normal or anomalous and the dataset includes attacks such as SQL injection, buffer overflow, information gathering, files disclosure, CRLF injection, XSS, server side include, parameter tampering and so on. This dataset has been successfully used for web detection in previous works [4, 5, 6, 7, 8, 9]. The traffic is generated following the next steps: First, real data are collected for all the parameters of the web application. All the data (names, surnames, addresses, etc.) are extracted from real databases. These values are stored in two databases: one for the normal values and other for the anomalous ones. Additionally, all the public available pages of the web application are listed. Next, normal and anomalous requests are generated for every web page. In the case that normal requests have parameters, the parameter values are filled out with data taken from the normal database randomly. The process is analogous for anomalous requests, where the values of the parameters are taken from the anomalous database. Three types of anomalous requests were considered: 1) Static attacks try to request hidden (or non-existent) resources. These requests include obsolete files, session ID in URL rewrite, configuration files, default files, etc. 2) Dynamic attacks modify valid request arguments: SQL injection, CRLF injection, cross-site scripting, buffer overflows, etc. 3) Unintentional illegal requests. These requests do not have malicious intention, however they do not follow the normal behavior of the web application and do not have the same structure as normal parameter values (for example, a telephone number composed of letters). The attacks were generated with the help of tools such as Paros [10] and W3AF[11]. The WAFs where this dataset was used [4,5,6,7] follow the anomaly approach, i.e. the normal behavior of the web application is defined and the behavior apart from that are considered anomalous. Therefore, in this approach only normal traffic is needed for the training phase. The dataset is divided into three different subsets. One subset for the training phase, which has only normal traffic. And two subsets for the test phase, one with normal traffic and the other one with malicious traffic. ;

HTTP数据集CSIC 2010包含针对我系开发的一款电子商务Web应用所生成的网络流量。在该Web应用中，用户可通过购物车选购商品，并通过提交个人信息完成注册。由于该Web应用为西班牙语版本，数据集内包含部分拉丁字符。该数据集为自动生成，共包含36000条正常请求与25000余条异常请求。所有HTTP请求均标注为正常或异常，数据集涵盖SQL注入（SQL Injection）、缓冲区溢出、信息收集、文件泄露、CRLF注入、跨站脚本（XSS，Cross-Site Scripting）、服务器端包含、参数篡改等多种攻击类型。过往研究[4,5,6,7,8,9]已成功将该数据集应用于Web检测任务。该网络流量的生成流程如下：首先，采集该Web应用所有参数的真实数据。所有数据（姓名、姓氏、地址等）均从真实数据库中提取，并分别存储于两个数据库中：一个用于存储正常参数值，另一个用于存储异常参数值。此外，还列出了该Web应用的所有公开可访问页面。随后，为每个Web页面生成正常与异常请求。若正常请求带有参数，则从正常数据库中随机选取参数值进行填充；异常请求的生成流程与之类似，参数值取自异常数据库。本次实验共考虑三类异常请求： 1. 静态攻击：尝试请求隐藏（或不存在）的资源，包括过时文件、URL重写中的会话ID、配置文件、默认文件等。 2. 动态攻击：修改合法请求参数，例如SQL注入、CRLF注入、跨站脚本、缓冲区溢出等。 3. 无意违规请求：此类请求无恶意意图，但不符合Web应用的正常使用逻辑，且参数值结构与正常参数值不一致（例如由字母组成的电话号码）。上述攻击借助Paros[10]与W3AF[11]等工具生成。过往使用该数据集的Web应用防火墙（WAF，Web Application Firewall）[4,5,6,7]均采用异常检测范式：即先定义Web应用的正常行为模式，所有偏离该模式的行为均被判定为异常。因此，该范式下的模型训练阶段仅需正常流量数据。该数据集被划分为三个不同子集：一个用于训练阶段，仅包含正常流量；另外两个用于测试阶段，分别包含正常流量与恶意流量。

提供机构：

IMPACT

创建时间：

2018-10-25

搜集汇总

数据集介绍

背景与挑战

背景概述

HTTP DATASET CSIC 2010是一个用于Web攻击防护系统测试的数据集，包含自动生成的HTTP请求，其中36,000个为正常请求，超过25,000个为异常请求，覆盖SQL注入、缓冲区溢出、XSS等多种攻击类型。数据集由西班牙研究国家委员会的信息安全研究所开发，采用真实数据生成，并分为训练子集（仅正常流量）和测试子集（正常和恶意流量），适用于基于异常检测的Web应用安全研究。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集