williegeodev/dga-prediction-muticlass-dataset
收藏Hugging Face2026-04-25 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/williegeodev/dga-prediction-muticlass-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
---
This dataset contains approximately 57,000 domain name records with 20 handcrafted lexical features extracted from second-level domain strings, constructed for DGA botnet family attribution across 27 malware families and one benign class. DGA domain strings were generated by executing the open-source family implementations in the baderj/domain_generation_algorithms repository, with each family capped at 3,000 samples to limit class dominance. Benign domains were sampled from the Tranco top-one-million list to match the total malicious sample count. The dataset supports a hierarchical classification design in which 15 major families with 1,000 or more samples are distinguished at Stage 2, and 11 minor families with between 100 and 703 samples are attributed at Stage 3. The dataset was partitioned using an 80/20 stratified train-test split with random_state=42 in all associated experiments.
提供机构:
williegeodev



