puyang2025/waf_data_v2

Name: puyang2025/waf_data_v2
Creator: puyang2025
Published: 2026-01-14 23:38:32
License: 暂无描述

Hugging Face2026-01-14 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/puyang2025/waf_data_v2

下载链接

链接失效反馈

官方服务：

资源简介：

--- pretty_name: WAF Data v2 language: - en license: mit task_categories: - text-classification task_ids: - multi-class-classification dataset_type: web-application-firewall tags: - http - security - requests - waf size_categories: - 1M<n<10M --- # WAF Data v2 ## Dataset Summary This dataset contains HTTP request records with labels for security-related classification tasks. Each row includes request metadata and content fields, plus a string label. ## Supported Tasks and Leaderboards - text-classification: classify HTTP request records based on the `label` field ## Languages - English (`en`) for textual tokens in headers/body; protocol and method fields use standard HTTP tokens ## Dataset Structure ### Data Instances Each instance is a single HTTP request: - `method`: HTTP method (string) - `url`: request URL (string) - `protocol`: HTTP protocol/version (string) - `headers`: serialized headers (string) - `body`: request body (string) - `label`: classification label (string) - `normal`: benign/legitimate request - `anomalous`: suspicious or attack-like request ### Data Files The dataset is stored as Parquet files with the following splits: - `train.parquet`: 1,924,461 rows - `eval.parquet`: 240,560 rows - `test.parquet`: 240,558 rows - `heldout.parquet`: 147,188 rows ### Data Fields All columns are stored as strings in the Parquet schema: - `method` (string) - `url` (string) - `protocol` (string) - `headers` (string) - `body` (string) - `label` (string) ## Data Preparation JSON source files were converted to Parquet for efficient storage and access. The Parquet files are the canonical dataset artifacts. ## Licensing Information MIT License. ## How to Load ```python from datasets import load_dataset data_files = { "train": "train.parquet", "validation": "eval.parquet", "test": "test.parquet", "heldout": "heldout.parquet", } ds = load_dataset("parquet", data_files=data_files) ``` ## Citation Information If you use this dataset, please add your citation here.

--- 数据集名称：WAF数据集v2（Web应用防火墙，WAF）语言： - 英语（`en`）许可证：MIT许可证任务类别： - 文本分类任务子类型： - 多分类数据集类型：Web应用防火墙标签： - HTTP - 安全 - 请求 - WAF 数据规模： - 100万条 < 数据量 < 1000万条 --- # WAF数据集v2 ## 数据集概述本数据集包含适用于安全相关分类任务的HTTP请求记录及对应标签。每条数据均包含请求元数据、内容字段与字符串类型标签。 ## 支持任务与排行榜 - 文本分类：基于`label`字段对HTTP请求记录进行分类 ## 语言说明请求头与请求体中的文本标记采用英语（`en`）；协议与请求方法字段使用标准HTTP标记。 ## 数据集结构 ### 数据实例每条数据对应一条独立的HTTP请求： - `method`：HTTP请求方法（字符串类型） - `url`：请求URL（字符串类型） - `protocol`：HTTP协议/版本（字符串类型） - `headers`：序列化后的请求头（字符串类型） - `body`：请求体（字符串类型） - `label`：分类标签（字符串类型） - `normal`：合法良性请求 - `anomalous`：可疑或疑似攻击请求 ### 数据文件本数据集以Parquet格式存储，包含以下拆分集： - `train.parquet`：1,924,461条数据 - `eval.parquet`：240,560条数据 - `test.parquet`：240,558条数据 - `heldout.parquet`：147,188条数据 ### 数据字段 Parquet架构下所有列均为字符串类型： - `method`（字符串类型） - `url`（字符串类型） - `protocol`（字符串类型） - `headers`（字符串类型） - `body`（字符串类型） - `label`（字符串类型） ## 数据预处理原始JSON文件已转换为Parquet格式以实现高效存储与访问，Parquet文件为本数据集的标准制品。 ## 许可证信息 MIT许可证。 ## 加载方式 python from datasets import load_dataset data_files = { "train": "train.parquet", "validation": "eval.parquet", "test": "test.parquet", "heldout": "heldout.parquet", } ds = load_dataset("parquet", data_files=data_files) ## 引用信息若使用本数据集，请在此处添加引用内容。

提供机构：

puyang2025

5,000+

优质数据集

54 个

任务类型

进入经典数据集