five

Fraudulent E-mail Corpus

收藏
www.kaggle.com2017-07-25 更新2025-03-23 收录
下载链接:
https://www.kaggle.com/rtatman/fraudulent-email-corpus
下载链接
链接失效反馈
官方服务:
资源简介:
### Context: Fraudulent e-mails contain criminally deceptive information, usually with the intent of convincing the recipient to give the sender a large amount of money. Perhaps the best known type of fraudulent e-mails is the [Nigerian Letter or “419”](https://www.fbi.gov/scams-and-safety/common-fraud-schemes/nigerian-letter-or-419-fraud) Fraud. ### Content: This dataset is a collection of more than 2,500 "Nigerian" Fraud Letters, dating from 1998 to 2007. These emails are in a single text file. Each e-mail has a header which includes the following information: * Return-Path: address the email was sent from * X-Sieve: the X-Sieve host (always cmu-sieve 2.0) * Message-Id: a unique identifier for each message * From: the message sender (sometimes blank) * Reply-To: the email address to which replies will be sent * To: the email address to which the e-mail was originally set (some are truncated for anonymity) * Date: Date e-mail was sent * Subject: Subject line of e-mail * X-Mailer: The platform the e-mail was sent from * MIME-Version: The Multipurpose Internet Mail Extension version * Content-Type: type of content & character encoding * Content-Transfer-Encoding: encoding in bits * X-MIME-Autoconverted: the type of autoconversion done * Status: r (read) and o (opened) ### Acknowledgements: If you use this collection of fraud email in your research, please include the following citation in any resulting papers: > Radev, D. (2008), CLAIR collection of fraud email, ACL Data and Code Repository, ADCR2008T001, http://aclweb.org/aclwiki ### Inspiration: * This dataset contains fraudulent e-mails sent over a period of years. Has the language used in fraudulent E-mails changed over time? * Are there any words or phrases that are particularly common in this type of e-mail? (You might compare it with the Enron email corpus, linked below) ### Related datasets: * https://www.kaggle.com/wcukierski/enron-email-dataset * https://www.kaggle.com/uciml/sms-spam-collection-dataset

### 上下文: 欺诈性电子邮件通常包含犯罪性的误导性信息,其目的在于说服收件人向发件人支付大量金钱。或许最为人熟知的欺诈性电子邮件类型即为尼日利亚信件或所谓的“419”欺诈。 ### 内容: 本数据集汇聚了超过2500封尼日利亚欺诈信件,其时间跨度自1998年至2007年。 这些电子邮件存放在一个单独的文本文件中。每封电子邮件均包含以下头部信息: * 返回路径:电子邮件发送的地址 * X-Sieve:X-Sieve主机(始终为cmu-sieve 2.0版本) * 消息ID:每条消息的唯一标识符 * 发件人:消息的发送者(有时为空) * 回复至:回复邮件应发送的电子邮件地址 * 收件人:电子邮件最初设定的电子邮件地址(部分为匿名处理而截断) * 日期:电子邮件发送的日期 * 主题:电子邮件的主题行 * X-Mailer:发送电子邮件的平台 * MIME版本:多用途互联网邮件扩展版本 * 内容类型:内容类型与字符编码 * 内容传输编码:位编码 * X-MIME-Autoconverted:自动转换的类型 * 状态:r(已读)和o(已打开) ### 致谢: 若您在研究中使用此欺诈电子邮件集合,请在任何结果论文中包含以下引用: > Radev, D. (2008), CLAIR欺诈电子邮件集合,ACL数据与代码存储库,ADCR2008T001,http://aclweb.org/aclwiki ### 灵感来源: * 本数据集包含了数年间发送的欺诈电子邮件。欺诈电子邮件所使用的语言是否随时间而发生了变化? * 是否存在某些在此类电子邮件中尤为常见的词汇或短语?(您或许可以将其与下述链接中的Enron电子邮件语料库进行比较) ### 相关数据集: * https://www.kaggle.com/wcukierski/enron-email-dataset * https://www.kaggle.com/uciml/sms-spam-collection-dataset
提供机构:
www.kaggle.com
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作