five

AfricanKillshot/Epstein-Files

收藏
Hugging Face2026-02-20 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/AfricanKillshot/Epstein-Files
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit language: - en tags: - epstein - epsteinfiles - files - epstein-files size_categories: - 100G<n<1T configs: - config_name: default data_files: - split: train path: "*.parquet" --- # The Epstein Files Dataset ![Epstein-Header](assets/controversial_logo.png) ## Dataset Description ### Dataset Summary This dataset comprises a curated collection of publicly available documents and related materials concerning Jeffrey Epstein. It includes unsealed court filings, FBI reports, DOJ publications, and other official investigative records. These files have been aggregated from reputable public sources, such as the U.S. Department of Justice Epstein Library, House Oversight Committee releases, and unsealed federal court documents. The dataset is presented in processed formats, including extracted text from PDFs and binary representations of any associated audio, images, or videos, to facilitate research, analysis, and archival purposes. All content is derived from public domain materials, with no addition of new copyrighted elements. Any applied processing is released under the MIT License. ## Dataset Creation ### Processing Details The data is stored in Apache Parquet format. - **Text Extraction**: Automated extraction of text from PDF documents. - **Multimedia Handling**: Audio, images, and videos are stored as binary blobs within the parquet structs. ### Recreate the Dataset To recreate this dataset exactly as provided, refer to the GitHub repository at https://github.com/Nikityyy/Epstein-Files. ## Additional Information ### Licensing Information Underlying documents are in the public domain per U.S. law. Processing contributions are licensed under MIT. ### Ethical Considerations This dataset involves sensitive topics related to investigations of abuse and exploitation. Users are encouraged to handle the data responsibly, respecting privacy and legal standards.

--- 许可证:MIT许可证(MIT License) language: - 英语 tags: - 爱泼斯坦 - 爱泼斯坦文件 - 文件 - 爱泼斯坦-文件 size_categories: - 100GB < 数据规模 < 1TB configs: - config_name: 默认配置 data_files: - split: 训练集 path: "*.parquet" --- # 爱泼斯坦文件数据集 ![爱泼斯坦头部标识](assets/controversial_logo.png) ## 数据集说明 ### 数据集概述 本数据集为经整理的公开可用文档及相关素材合集,主题围绕杰弗里·爱泼斯坦(Jeffrey Epstein),涵盖已解封的法庭文件、美国联邦调查局(Federal Bureau of Investigation, FBI)报告、美国司法部(Department of Justice, DOJ)出版物及其他官方调查记录。本数据集的文件均从可靠公开渠道聚合而来,包括美国司法部爱泼斯坦文库、美国众议院监督委员会公开文件以及已解封的联邦法庭文档。 本数据集以处理后的格式存储,包含从PDF文档中提取的文本,以及关联音频、图像、视频的二进制大对象(Binary Large Object, BLOB),以支持研究、分析与存档工作。所有内容均来自公有领域素材,未添加任何受版权保护的新内容。数据集的相关处理工作基于MIT许可证发布。 ## 数据集构建 ### 处理细节 数据存储于Apache Parquet格式中。 - **文本提取**:自动从PDF文档中提取文本内容。 - **多媒体处理**:音频、图像及视频以二进制大对象的形式存储于Parquet结构内。 ### 复刻数据集 若需完全复刻本数据集,请参考以下GitHub仓库:https://github.com/Nikityyy/Epstein-Files。 ## 附加信息 ### 许可信息 根据美国法律,本数据集的基础文档均属于公有领域。数据集的处理贡献部分基于MIT许可证发布。 ### 伦理考量 本数据集涉及与虐待及剥削调查相关的敏感内容,恳请使用者以负责任的态度处理数据,尊重隐私并遵守相关法律标准。
提供机构:
AfricanKillshot
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作