five

Contact Form Spam

收藏
www.kaggle.com2024-07-05 更新2025-03-25 收录
下载链接:
https://www.kaggle.com/frankcorso/contact-form-spam
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains over 2,000 spam contact form submissions across several of my websites. This can be used to work on classification ML models or other spam filters. I am still working on cleaning up a few other rounds of data, so I might be updating this to include a few thousand more later. A few notes about the dataset: * All of these forms contained three fields: email, name, and message, though the labels differed slightly from site to site. For example, one site may have had the name labeled as "Name" whereas another is labeled as "What is your name?". * In case any real people are submitting this spam instead of just bots, this dataset does not include the "email" field for submissions. * This set does not include non-spam submissions (AKA ham), so it should be combined with your own data or other datasets. * Some spam bots submit multiple times over the course of a few days or weeks so the dataset may contain duplicate submissions or submissions with very subtle differences. * Some bots, such as one about Bitcoin, would put a message into a "Name" field and then some random alpha-numeric characters into the "message" field. This dataset provides the data as entered. **License Info** Most of my datasets, models, and research, including this one, are published with [the CC BY-NC 4.0 license](https://creativecommons.org/licenses/by-nc/4.0/), which means you can use this however you want for non-commercial purposes as long as you provide attribution. If you require a commercial license, [connect with me on my site](https://frankcorso.me/). Dataset image by [vectorjuice on Freepik](https://www.freepik.com/free-vector/businessmen-get-advertising-phishing-spreading-malware-irrelevant-unsolicited-spam-message-spam-unsolicited-messages-malware-spreading-concept_11667625.htm#fromView=search&page=1&position=0&uuid=ef3f553d-fd18-4d22-b4a4-39b96fd9cca1)

本数据集包含了我多个网站上的超过2,000条垃圾邮件联系表单提交记录。此数据集可用于训练分类机器学习模型或其他垃圾邮件过滤器。我仍在整理其他几轮数据,因此未来可能更新此数据集,增加数千条新的记录。 关于数据集的几点说明: * 所有这些表单均包含三个字段:电子邮件、姓名和消息,尽管各网站上的标签略有不同。例如,一个网站可能将姓名标签为“姓名”,而另一个网站则标注为“您的姓名是什么?”。 * 如果提交垃圾邮件的是真实人物而非机器人,则本数据集不包括“电子邮件”字段。 * 本数据集不包含非垃圾邮件(亦称正常邮件)的提交,因此应将其与您自己的数据或其他数据集相结合。 * 一些垃圾邮件机器人可能在几天或几周内多次提交,因此数据集中可能包含重复的提交或具有细微差异的提交。 * 一些机器人,例如涉及比特币的机器人,会将消息放入“姓名”字段,然后在“消息”字段中输入一些随机的字母数字字符。本数据集提供了按原样输入的数据。 **许可信息** 我的大部分数据集、模型和研究,包括本数据集,均采用[CC BY-NC 4.0许可协议](https://creativecommons.org/licenses/by-nc/4.0/)发布,这意味着您可以在非商业用途下自由使用,前提是您必须注明出处。如果您需要商业许可,请通过我的网站[与我联系](https://frankcorso.me/)。 数据集图片由[Freepik上的vectorjuice](https://www.freepik.com/free-vector/businessmen-get-advertising-phishing-spreading-malware-irrelevant-unsolicited-spam-message-spam-unsolicited-messages-malware-spreading-concept_11667625.htm#fromView=search&page=1&position=0&uuid=ef3f553d-fd18-4d22-b4a4-39b96fd9cca1)提供。
提供机构:
www.kaggle.com
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作