davanstrien/finepdfs-temporal-stats-fin
收藏Hugging Face2025-12-11 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/davanstrien/finepdfs-temporal-stats-fin
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为网络是否变得更加教育化?,主要对芬兰语(fin_Latn)在105个CommonCrawl转储文件中的教育质量进行了时间序列分析。数据集包含两个主要部分:全局统计(global_stats)和时间统计(temporal_stats)。分析结果显示,从2013年到2025年,高教育内容的比例从4.0%增长到11.6%,平均教育评分从1.55增长到1.76。数据集处理了211,328份文档,共计2,604,770,921个标记,平均教育评分为1.720,高教育内容比例为10.2%。
The dataset is titled Is the Web Getting More Educational? and focuses on the temporal analysis of educational quality in Finnish (fin_Latn) across 105 CommonCrawl dumps. It includes two main components: global_stats and temporal_stats. The analysis shows that from 2013 to 2025, the rate of high educational content increased from 4.0% to 11.6%, and the average educational score rose from 1.55 to 1.76. The dataset processed 211,328 documents with a total of 2,604,770,921 tokens, achieving an average educational score of 1.720 and a high educational content rate of 10.2%.
提供机构:
davanstrien



