SuccessfulCrab/enron
收藏Hugging Face2025-12-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/SuccessfulCrab/enron
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含对完整Enron电子邮件语料库运行**实体提取器**服务的结果。实体提取器是一个基于Python的文本分析工具,结合正则表达式模式匹配和深度学习NER模型,从非结构化文本中识别和提取结构化数据。数据集源自Enron电子邮件语料库,经过实体提取器(Azure ML端点)处理,包含训练集。数据集的列包括行索引、原始文件路径、原始电子邮件内容(包括Message-ID和标头)以及提取的实体列表。提取的实体包括电子邮件地址、电话号码、BSB号码、ABN、账户号码、日期、支付方式、加密货币和命名实体等。数据集适用于合规与欺诈检测、网络分析、文档分析和加密货币调查等用例。
This dataset contains the results of running the **Entity Extractor** service over the complete Enron email corpus. The Entity Extractor is a Python-based text analysis tool that identifies and extracts structured data from unstructured text using a combination of regex pattern matching and deep learning NER models. The dataset is derived from the Enron Email Corpus and processed using the Entity Extractor (Azure ML Endpoint), including the `train` split. The schema includes columns for row index, original file path, raw email content (including Message-ID and headers), and a list of extracted entities. The extracted entities cover categories such as email addresses, phone numbers, BSB numbers, ABN, account numbers, dates, payment methods, cryptocurrency, and named entities. The dataset is suited for use cases like compliance & fraud detection, network analysis, document analysis, and cryptocurrency investigation.
提供机构:
SuccessfulCrab



