Techniques and Implementation of High-Quality Threat Intelligence Acquisition from the Dark Web

中国科学数据2026-03-16 更新2026-04-25 收录

下载链接：

https://www.sciengine.com/AA/doi/10.19678/j.issn.1000-3428.0068805

下载链接

链接失效反馈

官方服务：

资源简介：

There is a large amount of hidden information about cyber attacks or cybercrime in the dark web. Previous studies have mainly focused on analyzing general open source threat intelligence or working on a certain aspect of the dark web threat intelligence, lacking a systematic method to process and analyze dark web information and ignoring its characteristics. In order to analyze, screen, and extract the vast content of the dark web, a high-quality threat intelligence acquisition technology for the dark web is proposed using intelligence related to network security threats. It consists of four modules: information crawling, topic clustering, entity recognition, and novelty detection. Considering the dark web forum as an example, data from multiple forums are crawled by a crawler targeting the dark web forum. Top2Vec is used to embed the forum titles and posts into the same vector space in the form of words and documents, respectively. The discussion topics of the posts are analyzed, and threat intelligence-related contents are screened for coarse grains to remove noise from the crawled information. Then, named entity recognition is used for fine-grained filtering to extract threat intelligence entity words from the posts. On this basis, the information content of the entity words in the open web is calculated to evaluate the importance of the extracted information and ultimately select high-quality network security-related dark web threat intelligence. The experimental results show that this method is effective and can extract network threat intelligence from the collected dark web information.

创建时间：

2026-03-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集