hleAtKeeper/threat_detection_cli_validation_set

Name: hleAtKeeper/threat_detection_cli_validation_set
Creator: hleAtKeeper
Published: 2025-12-15 17:43:57
License: 暂无描述

Hugging Face2025-12-15 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/hleAtKeeper/threat_detection_cli_validation_set

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含200个带有风险级别标签的CLI（命令行界面）命令，用于威胁检测验证。数据集主要用于多类文本分类任务（风险级别预测），领域为网络安全和CLI命令。数据集中风险级别分布为：低风险（80.0%）、中风险（10.0%）、高风险（5.0%）和关键风险（5.0%）。数据集结构包括命令、标志、完整命令、风险级别、风险类别和风险评估解释等字段。所有200个样本均用于训练。数据集创建过程包括从更大的Linux命令数据集中随机采样、分层采样以达到目标分布，并补充了来自keeper-security/threat_detection数据集的关键风险样本。数据集的主要用途包括威胁检测模型的验证、CLI安全分类系统的基准测试以及测试不同风险级别的模型性能。局限性包括仅限于CLI命令（主要是Linux/Unix）、基于命令语法的风险评估（不考虑执行上下文）以及可能无法代表所有可能的命令变体。

This dataset contains 200 CLI (Command Line Interface) commands labeled with risk levels for threat detection validation purposes. The dataset is primarily for multi-class text classification (risk level prediction) in the domain of cybersecurity and CLI commands. The risk level distribution is: Low (80.0%), Medium (10.0%), High (5.0%), and Critical (5.0%). The dataset structure includes fields such as command, flags, full command, risk level, risk category, and base reasoning for risk assessment. All 200 samples are used for training. The dataset was created by randomly sampling commands from a larger Linux commands dataset, stratified sampling to achieve the target distribution, and supplementing with Critical risk samples from the keeper-security/threat_detection dataset. The primary use cases include validation of threat detection models, benchmarking CLI security classification systems, and testing model performance across different risk levels. Limitations include being limited to CLI commands (primarily Linux/Unix), risk assessments based on command syntax (not execution context), and may not represent all possible command variations.

提供机构：

hleAtKeeper

5,000+

优质数据集

54 个

任务类型

进入经典数据集