Dataset II.
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Dataset_II_/29855337
下载链接
链接失效反馈官方服务:
资源简介:
In the field of firmware security analysis for Internet of Things (IoT) devices, border binary detection has become an important research focus. However, the existing methods for border binary detection have problems such as insufficient feature characterization, high false-negative rates, and low intelligence levels. To mitigate these issues, we introduce BBDetector, a border binary detection method based on a multidimensional feature model. First, we constructed the first known set of border binaries at a certain scale by collecting and analyzing a diverse set of real-world firmware. To characterize the features of border binaries comprehensively, we proposed a multidimensional feature model (MDFM). Next, we extracted the feature vectors of binaries via the MDFM and designed a novel stacking method to achieve border binary detection. This method involves ensemble learning, combining extreme gradient boosting, light gradient boosting machine, and categorical boosting as base learners with random forest as the meta-learner. Finally, a border binary detection model (XLC-R) was obtained by training with feature vectors. We tested and evaluated BBDetector on two datasets. The experimental results showed that XLC-R achieved a precision of 94.98%, a recall of 91.02%, and an F1 score of 92.84% for the constructed representative Dataset I. Additionally, BBDetector detected 3.25 times and 2.23 times more border binaries in Dataset II than did the state-of-the-art tools Karonte and SaTC, respectively. BBDetector provides an accurate method for border binary detection in IoT firmware security analysis, significantly enhancing the pertinence of vulnerability detection, dramatically reducing the complexity of firmware security analysis, and providing essential technical support for improving IoT device security.
在物联网(Internet of Things,IoT)设备固件安全分析领域,边界二进制文件检测已成为重要研究热点。然而,现有边界二进制文件检测方法存在特征表征不足、假阴性率高、智能化程度偏低等问题。为缓解上述问题,本文提出一种基于多维特征模型的边界二进制文件检测方法BBDetector。首先,通过收集并分析多样化的真实固件样本,本文构建了目前已知的首个大规模边界二进制文件数据集。为全面表征边界二进制文件的特征,本文提出多维特征模型(Multidimensional Feature Model,MDFM)。随后,通过该模型提取二进制文件的特征向量,并设计了一种新型堆叠方法以实现边界二进制文件检测:该方法属于集成学习范畴,以极端梯度提升、轻量梯度提升机和类别梯度提升作为基学习器,以随机森林作为元学习器进行组合。最终,通过特征向量训练得到边界二进制文件检测模型XLC-R。本文在两个数据集上对BBDetector进行了测试与评估。实验结果表明,针对构建的代表性数据集I,XLC-R的精确率达94.98%、召回率为91.02%、F1值为92.84%。此外,在数据集II中,BBDetector检测到的边界二进制文件数量分别比当前最先进工具Karonte和SaTC高出3.25倍与2.23倍。BBDetector为物联网固件安全分析中的边界二进制文件检测提供了精准方法,显著提升了漏洞检测的针对性,大幅降低了固件安全分析的复杂度,为提升物联网设备安全性提供了关键技术支撑。
创建时间:
2025-08-07



