five

The Collaborative Organization of Knowledge: Data Set

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/2526702
下载链接
链接失效反馈
官方服务:
资源简介:
Wikipedia is an ongoing endeavor to create a free encyclopedia through an open computer-mediated collaborative effort. How does Wikipedia grow and maintain its coverage? This page contains supporing material relevant to a publication that examines this question. Diomidis Spinellis and Panagiotis Louridas. The collaborative organization of knowledge. Communications of the ACM, 51(8):68–73, August 2008. (doi:10.1145/1378704.1378720) In the above paper, a longitudinal study of Wikipedia's evolution shows that although Wikipedia's scope is increasing, its coverage is not deteriorating. This can be explained by the fact that referring to an non-existing entry typically leads to the establishment of an article for it. Wikipedia's evolution also demonstrates the creation of a large real world scale-free graph through a combination of incremental growth and preferential attachment. Though this data set you can download the processed results. The file starts with a header giving various attributes of the processed data set. % Number of bins: 72 % Total revisions: 28247658 % Maximum revisions: 28273 (George W. Bush) % Maximum reverts: 9218 (George W. Bush) % Number of moves: 81380 % Total pages: 1898139 % Revisions from IP addresses: 8518913 % Total contributors: 230130 % Maximum different contributors: 2539 (George W. Bush) % Redirected pages: 631567 % Restricted pages: 2441 % Maximum number of contained references: 17577 (List of all three letter acrony ms) % Pages with at least one revert: 211704 % Total number of reverts across all pages: 1147151 % Total time between reverts: 54524346346 % Moved pages: 80332 Next comes one line of data for each one of Wikipedia's entries. Here is an example. A (musical note):1128386876:Mailer diablo:1130566991:MrD9:10:7:18:0:0:0:0:0:0:0: 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0: 0:0:0:0:0:0:0:0:0:0:0:1:1:1:2:2:2:2:2:2:2:2:2:2:2:2:E Each line contains the following fields. Entry name Time of first definition (in seconds since Unix epoch) Name of the contributor who first defined the entry Time of first reference (in seconds since Unix epoch) Name of the contributor who first referenced the entry Number of references Number of contributors Number of revisions Number of reverts For each one of the time period bins (72 in this file) the number of references to the entry The letter "E" The fields are colon-separated. Colons in the input data are converted to an underscore. Finally, come lines summarizing the data set's characteristics for each time period. Here is an example. 2001-07-01 4851 0 27106 15129 13458 531 Each line contains the following fields. Start date of this period Number of entries Number of entries that are stubs Number of references Number of referenced articles Number of undefined entries Number of active contributors in this period
创建时间:
2020-01-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作