five

WikiRank quality scores and measures for Wikipedia articles (April 2022)

收藏
DataCite Commons2025-06-01 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/WikiRank_quality_scores_and_measures_for_Wikipedia_articles_April_2022_/19762927/1
下载链接
链接失效反馈
官方服务:
资源简介:
Those datasets include lists of over 43 million Wikipedia articles in 55 languages with quality scores by WikiRank (https://wikirank.net). Additionally, the datasets contain the quality measures (metrics) which directly affect these scores. Quality measures were extracted based on Wikipedia dumps from April, 2022.<br> <strong>License</strong> All files included in this datasets are released under CC BY 4.0: https://creativecommons.org/licenses/by/4.0/ <strong>Format</strong> page_id -- The identifier of the Wikipedia article (int), e.g. <em>840191</em> page_name -- The title of the Wikipedia article (utf-8), e.g.<em> Sagittarius A*</em> wikirank_quality -- quality score for Wikipedia article in a scale 0-100<em> (as of April 1, 2022). </em>This is a synthetic measure that was calculated based on the metrics below (also included in the datasets). norm_len - normalized "page length" norm_refs - normalized "number of references" norm_img - normalized "number of images" norm_sec - normalized "number of sections" norm_reflen - normalized "references per length ratio" norm_authors - normalized "number of authors" (without bots and anonymous users) flawtemps - flaw templates<br>
提供机构:
figshare
创建时间:
2022-05-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作