"X-phide phishing and legitimate urls with features"
收藏DataCite Commons2025-10-14 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/custom-phishing-and-legitimate-urls-features
下载链接
链接失效反馈官方服务:
资源简介:
"This dataset collection has been developed to support research in phishing URL detection by providing a comprehensive, diverse, and up-to-date set of phishing and legitimate web addresses. It consists of three datasets designed for both model training and cross-dataset evaluation.The first dataset is a custom dataset created by merging data from four publicly available sources: PhishTank (March 2024), AdaURL (March 2023), the Kaggle \u201cPhishing URLs Dataset\u201d by Hassaan Mustafavi (January 2025), and PhishDataset (May-June 2021). This combination provides a balanced and representative sample of phishing and legitimate URLs.The second dataset, sourced from GramBeddings, was used to analyse feature relevance and selection using the same model training algorithm as for the custom dataset. Through feature correlation and recursive feature elimination, eight key features were identified from ninety available, highlighting differences in feature importance across datasets.The third dataset, the PhiUSIIL dataset (134,850 legitimate and 100,945 phishing URLs), was employed primarily for testing and cross-dataset generalisation analysis. Evaluation revealed variations in feature contribution - for example, HTTPS protocol presence proved highly discriminative in PhiUSIIL but less effective in the GramBeddings dataset.This dataset collection enables robust model training, feature analysis, and cross-dataset performance evaluation for phishing URL detection tasks, supporting further research on real-world phishing threats and model generalisability."
提供机构:
IEEE DataPort
创建时间:
2025-10-14



