five

"Longitudinal Benign and DGA Domain Name Dataset"

收藏
DataCite Commons2026-04-27 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/longitudinal-benign-and-dga-domain-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
"This dataset supports longitudinal evaluation of Domain Generation Algorithm (DGA) detection under real-world concept drift, spanning nine years (2017\u20132025) with approximately 49.4 million unique benign domains, and 149.4 million unique DGA domains covering 147 families.Benign domains are sourced from Alexa Top 1M and Tranco Top 1M. Historical Alexa snapshots were retrieved via the Internet Archive Wayback Machine, and historical Tranco lists were obtained through the Tranco API. Yearly snapshots were collected from 2017 to 2025. Domains appearing in both the benign and DGA sets were removed to prevent cross-contamination.DGA domains are sourced from DGArchive, maintained by Simon Ofner at Fraunhofer FKIE. DGArchive provides deterministic domain outputs derived directly from reverse-engineered malware algorithms and seeds, along with per-domain timestamps. 147 out of 151 available families were selected based on suitability for longitudinal analysis, covering both character-based (132 families) and word-based (15 families) generation schemes.Each domain is temporally aligned with year, making the dataset well-suited for forward-chaining evaluations that reflect how detection models degrade as new DGA variants emerge over time. The dataset is intended to serve as a rigorous benchmark for researchers devloping drift-resilient DGA detectors."
提供机构:
IEEE DataPort
创建时间:
2026-04-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作