"Longitudinal Benign and DGA Domain Name Dataset"
收藏DataCite Commons2026-04-27 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/longitudinal-benign-and-dga-domain-dataset
下载链接
链接失效反馈官方服务:
资源简介:
"This dataset supports longitudinal evaluation of Domain Generation Algorithm (DGA) detection under real-world concept drift, spanning nine years (2017\u20132025) with approximately 49.4 million unique benign domains, and 149.4 million unique DGA domains covering 147 families.Benign domains are sourced from Alexa Top 1M and Tranco Top 1M. Historical Alexa snapshots were retrieved via the Internet Archive Wayback Machine, and historical Tranco lists were obtained through the Tranco API. Yearly snapshots were collected from 2017 to 2025. Domains appearing in both the benign and DGA sets were removed to prevent cross-contamination.DGA domains are sourced from DGArchive, maintained by Simon Ofner at Fraunhofer FKIE. DGArchive provides deterministic domain outputs derived directly from reverse-engineered malware algorithms and seeds, along with per-domain timestamps. 147 out of 151 available families were selected based on suitability for longitudinal analysis, covering both character-based (132 families) and word-based (15 families) generation schemes.Each domain is temporally aligned with year, making the dataset well-suited for forward-chaining evaluations that reflect how detection models degrade as new DGA variants emerge over time. The dataset is intended to serve as a rigorous benchmark for researchers devloping drift-resilient DGA detectors."
提供机构:
IEEE DataPort
创建时间:
2026-04-27



