five

SQL Injection Attack Dataset

收藏
Mendeley Data2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/mmc4sdmnrc/2
下载链接
链接失效反馈
官方服务:
资源简介:
The carefully built dataset presented in this paper is intended to be used in the training of supervised machine learning algorithms that identify SQLI threats. We manually gathered datasets from Kaggle and GitHub. There are 47,464 distinct SQL queries in it, including both legitimate and malicious ones. All of the components related to SQL queries are contained in each entry of this dataset, including semicolons, single quotes, intermediate data, text fragments, and SQL keywords. Each row in the dataset has a binary label, where attack SQL queries are indicated by 1 and normal queries by 0. The dataset is built with 25,800 benign queries and 21,664 destructive queries (see Fig. 6). The single-column display of this binary labeling facilitates the identification of the kind of query. This work’s second primary contribution is the creation of a 19-feature numeric training dataset. Through feature homogeneity across the dataset, this study aims to raise machine learning algorithm accuracy and precision. 18 useful numerical features were extracted from typical SQLi datasets as the first step in the development process. The source code of every query in the chosen original dataset was used to create these features. The design of the dataset consists of one dependent feature, which acts as the label designating the type of query (0 for normal and 1 for malicious), and eighteen independent characteristics. Constants, punctuation, logical operators, the duration of the question, and the number of nested queries are among the purely numerical data that are extracted. As a result, there are 47,464 records in the improved dataset, and each record has 18 extracted attributes.
提供机构:
University of Kufa
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作