BEE-spoke-data/govdocs1-by-extension
收藏Hugging Face2025-07-27 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/BEE-spoke-data/govdocs1-by-extension
下载链接
链接失效反馈官方服务:
资源简介:
govdocs1数据集包含多种文件格式的Markdown解析版本,经过轻度过滤。数据集分为不同的配置,每个配置对应不同的文件扩展名,包括doc, docx, logs, ppt, pptx, rtf, txt等。每个配置都包含训练、验证和测试分割。数据集的下载大小和大小因配置而异。
The govdocs1 dataset includes multiple configurations, each corresponding to a different file extension, such as doc, docx, logs, ppt, pptx, rtf, and txt. Each configuration contains train, validation, and test splits. The dataset is described as Markdown-parsed versions of documents in govdocs1 with light filtering.
提供机构:
BEE-spoke-data



