teyler/epstein-files-20k
收藏Hugging Face2025-12-14 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/teyler/epstein-files-20k
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个关于Epstein庄园的公开文件集合,旨在支持透明的研究和探索。数据集包含超过25,000个纯文本文件,这些文件源自美国众议院监督委员会发布的公开材料。文件分为两类:一类是原始文本文件(如PDF、电子邮件)转换的纯文本,另一类是通过OCR技术从图像文件转换的文本。数据集主要用于研究和教育目的,如信息检索、文本挖掘和探索性分析。使用该数据集时需遵守法律和伦理规定,不得用于非法或不道德的目的。
This dataset is a collection of public documents related to the Epstein estate, intended to support transparent research and exploration. It contains over 25,000 plain text files derived from publicly released materials by the U.S. House Oversight Committee. The files are divided into two categories: those converted from original text-based files (e.g., PDFs, emails) and those converted from image files via OCR. The dataset is primarily for research and educational purposes, such as information retrieval, text mining, and exploratory analysis. Users must comply with legal and ethical guidelines and avoid unlawful or unethical uses.
提供机构:
teyler



