Naren1704/phishing-dataset
收藏Hugging Face2025-12-13 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Naren1704/phishing-dataset
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个用于分类和钓鱼检测任务的综合数据集,包含来自不同来源的数据,如URL、短信、电子邮件和HTML代码。所有数据集都经过预处理,消除了空值、空数据和重复数据,并进行了类别平衡以避免偏差。数据集的结构包含两列:text和label,其中text字段可以是URL、短信、电子邮件或HTML代码,label字段标记为1(钓鱼)或0(良性)。数据集来源包括邮件数据集、短信数据集、URL数据集和网站数据集,并提供了组合数据集(完整版和简化版)的详细信息。
Phishing datasets compiled from various resources for classification and phishing detection tasks. All datasets have been preprocessed in terms of eliminating null, empty and duplicate data. Class balancing has also been performed to avoid possible biases. Datasets have the same structure of two columns: `text` and `label`. Text field can contain samples of URL, SMS messages, Email messages, or HTML code, and all records are labeled as 1 (Phishing) or 0 (Benign). The dataset includes sources from mail dataset, SMS message dataset, URL dataset, and website dataset, and provides details about the combined dataset (full and reduced versions).
提供机构:
Naren1704



