davanstrien/finepdfs-temporal-stats-all
收藏Hugging Face2025-12-11 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/davanstrien/finepdfs-temporal-stats-all
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是对106个CommonCrawl转储中所有语言的教育质量进行的时间序列分析。它展示了从2013年到2025年网络教育内容的变化趋势,包括平均教育分数和高教育内容比例的变化。数据集处理了49,526,501份文档,总计360,705,622,033个标记,平均教育分数为1.997,高教育内容比例为14.5%。数据集包含全局统计和时间统计两个部分,分别提供总体摘要和按转储时间排序的详细分析。
This dataset presents a temporal analysis of educational quality across all languages in 106 CommonCrawl dumps. It shows trends in web educational content from 2013 to 2025, including changes in average educational scores and high-educational-content rates. The dataset processed 49,526,501 documents totaling 360,705,622,033 tokens, with an average educational score of 1.997 and a high-educational-content rate of 14.5%. It includes both global statistics (overall summary) and temporal statistics (per-dump breakdown sorted chronologically).
提供机构:
davanstrien



