five

FCG-MFD: Benchmark Function Call Graph-Based Dataset for Malware Family Detection

收藏
Figshare2024-10-29 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/_b_FCG-MFD_Benchmark_Function_Call_Graph-Based_Dataset_for_Malware_Family_Detection_b_/26886148
下载链接
链接失效反馈
官方服务:
资源简介:
Cyber crimes related to malware families are on the rise. This growth persists despite the prevalence of various antivirus software and approaches for malware detection and classification. Security experts have implemented Machine Learning (ML) techniques to identify these cyber-crimes. However, these approaches demand updated malware datasets for continuous improvements amid the evolving sophistication of malware strains. Thus, we present the FCG-MFD, a benchmark dataset with extensive Function Call Graphs (FCG) for malware family detection. This dataset guarantees resistance against emerging malware families by enabling security systems. Our dataset has two sub-datasets (FCG & Metadata) (1,00,000 samples) from VirusSamples, Virusshare, VirusSign, theZoo, Vx-underground, and MalwareBazaar curated using FCGs and metadata to optimize the efficacy of ML algorithms. We suggest a new malware analysis technique using FCGs and graph embedding networks, offering a solution to the complexity of feature engineering in ML-based malware analysis. Our approach to extracting semantic features via the Natural Language Processing (NLP) method is inspired by tasks involving sentences and words, respectively, for functions and instructions. We leverage a node2vec mechanism-based graph embedding network to generate malware embedding vectors. These vectors enable automated and efficient malware analysis by combining structural and semantic features. We use two datasets (FCG & Metadata) to assess FCG-MFD performance. F1-Scores of 99.14% and 99.28% are competitive with State-of-the-art (SOTA) methods.
创建时间:
2024-10-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作