DarkPT-BR: Labeled Posts from Brazilian Portuguese Dark Web Forums
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/33kff5gb3h
下载链接
链接失效反馈官方服务:
资源简介:
DarkPT-BR is a labeled dataset containing posts collected from Brazilian Portuguese forums on the Dark Web. It supports research in Cyber Threat Intelligence (CTI) and malicious content detection. The dataset includes binary and probabilistic labels for relevance, based on the presence of Indicators of Compromise (IoCs), contextual keywords, and manual expert analysis.
It is structured in three versions:
Dataset I: Initial labeling via heuristics (IoCs + keywords). Contains 17,675 posts from the Dark Web (1,665 Relevant, 16,010 Not Relevant).
Dataset II: Manually revised and expanded version. Contains 26,575 posts from the Dark Web (3,341 Relevant, 23,234 Not Relevant). Note: All posts from Dataset I are included in Dataset II.
Dataset III: Contains 7,498 previously unlabeled posts from the Dark Web. The labels for these posts were assigned through the developed post identification model, assigning a probability between 0 and 1 for the relevance of the post.
This dataset can support machine learning research in threat classification, text mining, and cybersecurity automation. The code associated with the data can be found in https://github.com/sebastiaoafilho/Malicious_Posts_Identification
创建时间:
2025-08-05



