davanstrien/finepdfs-nld-stats
收藏Hugging Face2025-12-11 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/davanstrien/finepdfs-nld-stats
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是对HuggingFaceFW/finepdfs-edu数据集的荷兰语(拉丁字母)部分进行聚合统计的结果。统计信息包括全局摘要(如总文档数、总令牌数、平均令牌数/文档等)、语言统计、提取器统计和转储统计。数据集展示了使用Polars流式聚合与HuggingFace Hub集成的能力,并通过Polars的高效处理实现了快速统计计算。
This dataset contains aggregate statistics computed for the Dutch (Latin script) portion of the HuggingFaceFW/finepdfs-edu dataset. The statistics include global summaries (such as total documents, total tokens, average tokens per document, etc.), language statistics, extractor statistics, and dump statistics. The dataset demonstrates Polars streaming aggregation with HuggingFace Hub integration, achieving fast statistical computations through Polars efficient processing.
提供机构:
davanstrien



