five

Avocado Research Email Collection

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2015T03
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>Avocado Research Email Collection consists of emails and attachments taken from 279 accounts of a defunct information technology company referred to as "Avocado". Most of the accounts are those of Avocado employees; the remainder represent shared accounts such as "Leads", or system accounts such as "Conference Room Upper Canada".</p><br> <p>The collection consists of the processed personal folders of these accounts with metadata describing folder structure, email characteristics and contacts, among others. It is expected to be useful for social network analysis, e-discovery and related fields.</p><br> <h3>Data</h3><br> <p>The source data for the collection consisted of Personal Storage Table (PST) files for 282 accounts. A PST file is used by MS Outlook to store emails, calendar entries, contact details, and related information. Data was extracted from the PST files using libpst version 0.6.54. Three files produced no output and and are not included in the collection. Each account is referred to as a "custodian" although some of the accounts do not correspond to humans.</p><br> <p>The collection is divided into metadata and text. The metadata is represented in XML, with a single top-level XML file listing the custodians, and then one XML file per custodian listing all items extracted from that custodian's PST files. The full XML tree can be read by loading the top-level file with an XML parser that handles directives. All XML metadata files are encoded in UTF-8. The text contains the extracted text of the items in the custodians' folders, with the extracted text for each item being held in a separate file. The text files are then zipped into a zip file per custodian.</p><br> <h3>Licensing</h3><br> <p>Users are required to sign two license agreements in order to access this corpus, the Avocado Collection Organizational License Agreement and the Avocado Collection End User Agreement. Those agreements can be viewed in the License field of this catalog entry.</p><br> <h3>Updates</h3><br> <p>None at this time.</p></br> Portions © 2015 Sherwood Partners, © 2015 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
Avocado Research Email Collection是一个包含279个账户的电子邮件和附件的数据集,源自一家已停运的IT公司'Avocado'。数据从PST文件提取,分为XML格式的元数据和压缩的文本文件,适用于社交网络分析、电子发现等研究领域,访问需签署特定许可协议。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作