English Wikipedia Quality Asssessment Dataset
收藏DataCite Commons2025-06-01 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/dataset/English_Wikipedia_Quality_Asssessment_Dataset/1375406/1
下载链接
链接失效反馈官方服务:
资源简介:
Dataset of 30,272 articles from English Wikipedia gathered on 2015/02/05, with associated revision content (wiki markup). The dataset has been split into a 90% training set and 10% test set. The articles were sampled from six of English Wikipedia's seven assessment classes, with the exception of the Featured Article class, which contains all 4,455 articles identified at the time. Due to the low usage of A-class articles (< 0.02% of all articles), this class is not part of the dataset. Articles are assumed to belong to the highest quality class they are rated as, and article history has been mined to find the appropriate revision associated with a given quality rating. For details, see "The Success and Failure of Quality Improvement Projects in Peer Production Communities" by Warncke-Wang et al. (CSCW 2015), linked below. This dataset has been used in training the Wiki-Class Python library machine learner, also linked below.
提供机构:
figshare
创建时间:
2016-01-19



