five

CompPhish

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/fmbs4kp9wz
下载链接
链接失效反馈
官方服务:
资源简介:
About the dataset : A comprehensive phishing dataset which includes labelled phishing as well as legitimate URLs along with their respective HTML codes. Each URL and its HTML code file is associated with the same serial number. The dataset size is 15,358 samples, where 7,204 samples are phishing, and 8,154 are legitimate. Data Collection: Phishing URLs are collected from PhishTank and OpenPhish repositories and legitimate URLs from the DataForSEO Top-1000 websites list. The HTML codes of the URLs are downloaded by using the Python Programming Language after visiting the URL while it is active. Label Information: Labels 0 for legitimate and 1 for phishing are used. Information about Features: 70 features are extracted from the raw URLs and their HTML codes. These features cover various types of phishing attacks: URL-based phishing attacks, brand-jacking, phishing sites hosted on compromised domains (PSHCD), and auto-downloadable malicious files links. Usage: The processed dataset can be used by researchers for further analysis by applying various ML algorithms or feature selection techniques to achieve considerable results. The raw URLs and their HTML source code can also be used for extracting novel features and proposing novel detection methodologies.
创建时间:
2025-12-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作