davanstrien/finepdfs-temporal-stats-nl
收藏Hugging Face2025-12-11 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/davanstrien/finepdfs-temporal-stats-nl
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个关于德语(拉丁字母)网络内容教育质量的时间分析研究,基于106个CommonCrawl转储文件。它展示了从2013年到2025年间教育内容的变化趋势,包括平均教育评分和高教育内容比例。数据集包含3,844,508个文档处理结果,总标记数达20,308,107,262个。分析结果显示,平均教育评分为2.050,高教育内容比例为13.1%。数据集分为全局统计(整体概览)和时间统计(按时间细分的分析)两部分。
This dataset presents a temporal analysis of educational quality in German (Latin script) web content across 106 CommonCrawl dumps. It tracks trends in educational content from 2013 to 2025, including average educational scores and high-education content rates. The analysis covers 3,844,508 processed documents with 20,308,107,262 total tokens. Results show an average educational score of 2.050 and a high-education rate of 13.1%. The dataset is organized into global statistics (overall summary) and temporal statistics (per-dump chronological breakdown).
提供机构:
davanstrien



