five

Wikipedia time-series graph

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/886483
下载链接
链接失效反馈
官方服务:
资源简介:
Wikipedia temporal graph. The dataset is based on two Wikipedia SQL dumps: (1) English language articles and (2) user visit counts per page per hour (aka pagecounts). The original datasets are publicly available on the Wikimedia website. Static graph structure is extracted from English language Wikipedia articles. Redirects are removed. Before building the Wikipedia graph we introduce thresholds on the minimum number of visits per hour and maximum in-degree. We remove the pages that have less than 500 visits per hour at least once during the specified period. Besides, we remove the nodes (pages) with in-degree higher than 8 000 to build a more meaningful initial graph. After cleaning, the graph contains 116 016 nodes (out of total 4 856 639 pages), 6 573 475 edges. The graph can be imported in two ways: (1) using edges.csv and vertices.csv or (2) using enwiki-20150403-graph.gt file that can be opened with open source Python library Graph-Tool. Time-series data contains users' visit counts from 02:00, 23 September 2014 until 23:00, 30 April 2015. The total number of hours is  5278. The data is stored in two formats: CSV and H5. CSV file contains data in the following format [page_id :: count_views :: layer], where layer represents an hour. In H5 file, each layer corresponds to an hour as well.
创建时间:
2020-01-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作