MH-1M Dataset
收藏Figshare2025-02-06 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/_b_MH-1M_Dataset_b_/28355897
下载链接
链接失效反馈官方服务:
资源简介:
The rapid and widespread increase of Android malware presents substantial obstacles to cybersecurity research. In order to revolutionize the field of malware research, we present the MH-1M dataset, which is a thorough compilation of 1,340,515 APK samples. This dataset encompasses a wide range of diverse attributes and metadata, offering a comprehensive perspective. The utilization of the VirusTotal API guarantees precise assessment of threats by amalgamating various detection techniques. Our research indicates that MH-1M is a highly current dataset that provides valuable insights into the changing nature of malware.MH-1M consists of 23,247 features that cover a wide range of application behavior, from intents::accept to apicalls::landroid/window/splashscreenview.remove. The features are categorized into four primary classifications:Feature TypesValuesAPICalls22,394Intents407OPCodes232Permissions214The dataset is stored efficiently, utilizing a memory capacity of 29.0 GB, which showcases its substantial yet controllable magnitude. The dataset consists of 1,221,421 benign applications and 119,094 malware applications, ensuring a balanced representation for accurate malware detection and analysis.The MH-1M repository also offers a wide variety of metadata from APKs, providing useful data into the development of malicious software over a period of more than ten years. The Android features include a wide variety of metadata, which includes SHA256 hashes, file names, package names, compilation APIs, and various other details. This GitHub repository contains over 400GB of valuable data, making it the largest and most comprehensive dataset available for advancing research and development in Android malware detection.
安卓恶意软件的快速大规模增长给网络安全研究带来了显著障碍。为革新恶意软件研究领域,我们推出MH-1M数据集,该数据集完整收录了1,340,515个APK(Android Package Kit)样本。本数据集涵盖丰富多样的属性与元数据,可提供全面的研究视角。通过整合多种检测技术,借助VirusTotal API可实现精准的威胁评估。
本研究证实,MH-1M是时效性极强的数据集,可为洞悉恶意软件的演化特性提供极具价值的参考依据。MH-1M包含23,247个特征,覆盖应用行为的广泛范畴,从intents::accept到apicalls::landroid/window/splashscreenview.remove。这些特征分为四大主要类别:API调用(APICalls)共22,394项、意图(Intents)共407项、操作码(OPCodes)共232项、权限(Permissions)共214项。
本数据集存储效率优异,占用内存总量为29.0 GB,展现出规模可观但易于管控的特性。数据集包含1,221,421个良性应用与119,094个恶意应用,可保障样本分布均衡,为精准的恶意软件检测与分析奠定坚实基础。
MH-1M仓库还提供了大量APK相关元数据,可为研究者提供十余年间恶意软件演化发展的有效参考数据。这些安卓相关元数据包括SHA256哈希值、文件名、包名、编译API以及其他各类详细信息。本GitHub仓库坐拥超过400 GB的宝贵数据,是当前用于推动安卓恶意软件检测领域研究与开发的规模最大、内容最全面的数据集。
创建时间:
2025-02-06



