five

williegeodev/dga-prediction-muticlass-dataset

收藏
Hugging Face2026-04-25 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/williegeodev/dga-prediction-muticlass-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 --- This dataset contains approximately 57,000 domain name records with 20 handcrafted lexical features extracted from second-level domain strings, constructed for DGA botnet family attribution across 27 malware families and one benign class. DGA domain strings were generated by executing the open-source family implementations in the baderj/domain_generation_algorithms repository, with each family capped at 3,000 samples to limit class dominance. Benign domains were sampled from the Tranco top-one-million list to match the total malicious sample count. The dataset supports a hierarchical classification design in which 15 major families with 1,000 or more samples are distinguished at Stage 2, and 11 minor families with between 100 and 703 samples are attributed at Stage 3. The dataset was partitioned using an 80/20 stratified train-test split with random_state=42 in all associated experiments.
提供机构:
williegeodev
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作