five

English Wikipedia Quality Asssessment Dataset

收藏
DataCite Commons2025-06-01 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/dataset/English_Wikipedia_Quality_Asssessment_Dataset/1375406/1
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset of 30,272 articles from English Wikipedia gathered on 2015/02/05, with associated revision content (wiki markup). The dataset has been split into a 90% training set and 10% test set. The articles were sampled from six of English Wikipedia's seven assessment classes, with the exception of the Featured Article class, which contains all 4,455 articles identified at the time. Due to the low usage of A-class articles (< 0.02% of all articles), this class is not part of the dataset. Articles are assumed to belong to the highest quality class they are rated as, and article history has been mined to find the appropriate revision associated with a given quality rating. For details, see "The Success and Failure of Quality Improvement Projects in Peer Production Communities" by Warncke-Wang et al. (CSCW 2015), linked below. This dataset has been used in training the Wiki-Class Python library machine learner, also linked below.
提供机构:
figshare
创建时间:
2016-01-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作