Fraudulent E-mail Corpus
收藏www.kaggle.com2017-07-25 更新2025-03-23 收录
下载链接:
https://www.kaggle.com/rtatman/fraudulent-email-corpus
下载链接
链接失效反馈官方服务:
资源简介:
### Context:
Fraudulent e-mails contain criminally deceptive information, usually with the intent of convincing the recipient to give the sender a large amount of money. Perhaps the best known type of fraudulent e-mails is the [Nigerian Letter or “419”](https://www.fbi.gov/scams-and-safety/common-fraud-schemes/nigerian-letter-or-419-fraud) Fraud.
### Content:
This dataset is a collection of more than 2,500 "Nigerian" Fraud Letters, dating from 1998 to 2007.
These emails are in a single text file. Each e-mail has a header which includes the following information:
* Return-Path: address the email was sent from
* X-Sieve: the X-Sieve host (always cmu-sieve 2.0)
* Message-Id: a unique identifier for each message
* From: the message sender (sometimes blank)
* Reply-To: the email address to which replies will be sent
* To: the email address to which the e-mail was originally set (some are truncated for anonymity)
* Date: Date e-mail was sent
* Subject: Subject line of e-mail
* X-Mailer: The platform the e-mail was sent from
* MIME-Version: The Multipurpose Internet Mail Extension version
* Content-Type: type of content & character encoding
* Content-Transfer-Encoding: encoding in bits
* X-MIME-Autoconverted: the type of autoconversion done
* Status: r (read) and o (opened)
### Acknowledgements:
If you use this collection of fraud email in your research, please include the following citation in any resulting papers:
> Radev, D. (2008), CLAIR collection of fraud email, ACL Data and Code Repository, ADCR2008T001, http://aclweb.org/aclwiki
### Inspiration:
* This dataset contains fraudulent e-mails sent over a period of years. Has the language used in fraudulent E-mails changed over time?
* Are there any words or phrases that are particularly common in this type of e-mail? (You might compare it with the Enron email corpus, linked below)
### Related datasets:
* https://www.kaggle.com/wcukierski/enron-email-dataset
* https://www.kaggle.com/uciml/sms-spam-collection-dataset
### 上下文:
欺诈性电子邮件通常包含犯罪性的误导性信息,其目的在于说服收件人向发件人支付大量金钱。或许最为人熟知的欺诈性电子邮件类型即为尼日利亚信件或所谓的“419”欺诈。
### 内容:
本数据集汇聚了超过2500封尼日利亚欺诈信件,其时间跨度自1998年至2007年。
这些电子邮件存放在一个单独的文本文件中。每封电子邮件均包含以下头部信息:
* 返回路径:电子邮件发送的地址
* X-Sieve:X-Sieve主机(始终为cmu-sieve 2.0版本)
* 消息ID:每条消息的唯一标识符
* 发件人:消息的发送者(有时为空)
* 回复至:回复邮件应发送的电子邮件地址
* 收件人:电子邮件最初设定的电子邮件地址(部分为匿名处理而截断)
* 日期:电子邮件发送的日期
* 主题:电子邮件的主题行
* X-Mailer:发送电子邮件的平台
* MIME版本:多用途互联网邮件扩展版本
* 内容类型:内容类型与字符编码
* 内容传输编码:位编码
* X-MIME-Autoconverted:自动转换的类型
* 状态:r(已读)和o(已打开)
### 致谢:
若您在研究中使用此欺诈电子邮件集合,请在任何结果论文中包含以下引用:
> Radev, D. (2008), CLAIR欺诈电子邮件集合,ACL数据与代码存储库,ADCR2008T001,http://aclweb.org/aclwiki
### 灵感来源:
* 本数据集包含了数年间发送的欺诈电子邮件。欺诈电子邮件所使用的语言是否随时间而发生了变化?
* 是否存在某些在此类电子邮件中尤为常见的词汇或短语?(您或许可以将其与下述链接中的Enron电子邮件语料库进行比较)
### 相关数据集:
* https://www.kaggle.com/wcukierski/enron-email-dataset
* https://www.kaggle.com/uciml/sms-spam-collection-dataset
提供机构:
www.kaggle.com



