Euroswarms/blackarch
收藏Hugging Face2026-04-25 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/Euroswarms/blackarch
下载链接
链接失效反馈官方服务:
资源简介:
一个监督微调(SFT)数据集,用于教导语言模型如何使用BlackArch Linux仓库中的约2,850个命令行工具——包括每个工具的功能、安装方法、重要标志以及实际命令示例。覆盖范围涵盖49个工具类别,从Web应用测试到逆向工程再到无线利用。数据来源包括:BlackArch工具索引数据(约2,855个条目,包含名称、版本、描述、类别和URL)、86个主要工具的嵌入式语料库(每个工具包含5-25个手工编写的(问题,命令)对,涵盖常见操作、标志组合和输出模式),以及从每个工具的GitHub README中提取的代码块作为补充命令示例。数据集还包含系统提示(如工具使用专家、工具解释专家、工具安装专家)和多种行类型(如工具是什么、工具安装、主要工具使用等)。支持增强模式(如过采样),规模可从基础约6,200行扩展到100万行。
A supervised fine-tuning (SFT) dataset that teaches a language model how to use the ~2,850 CLI tools in the BlackArch Linux repository — what each tool does, how to install it, which flags matter, and practical command examples. Coverage spans 49 tool categories from web application testing to reverse engineering to wireless exploitation. Sources include: BlackArch tools index data (~2,855 entries with name, version, description, category, url), an embedded corpus for 86 major tools (5–25 hand-written (question, command) pairs per tool covering common operations, flag combinations, and output modes), and code blocks extracted from each tools GitHub README for supplemental command examples. The dataset also includes system prompts (e.g., tool use expert, tool explain expert, tool install expert) and row kinds (e.g., tool what is, tool install, major tool use). Augmentation is supported (e.g., oversampling), and scale ranges from ~6,200 base rows to 1,000,000 rows.
提供机构:
Euroswarms



