Extended Wikipedia Web Traffic Daily Dataset (without Missing Values)
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7371037
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains 145063 time series representing the number of hits or web traffic for a set of Wikipedia pages from 2015-07-01 to 2022-06-30. This is an extended version of the dataset that was used in the Kaggle Wikipedia Web Traffic forecasting competition. For consistency, the same Wikipedia pages that were used in the competition have been used in this dataset as well. The colons (:) in article names have been replaced by dashes (-) to make the .tsf file readable using our data loaders.
The original dataset contains missing values. They have been simply replaced by zeros.
The data were downloaded from the Wikimedia REST API. According to the conditions of the API, this dataset is licensed under CC-BY-SA 3.0 and GFDL licenses.
本数据集包含145063条时间序列数据,对应2015年7月1日至2022年6月30日期间一组维基百科页面的访问量或网络流量数据。本数据集是Kaggle维基百科网络流量预测竞赛所用数据集的扩展版本。为保持一致性,本数据集沿用了该竞赛中使用的同一批维基百科页面。文章名称中的冒号(:)已替换为连字符(-),以便通过我们的数据加载器读取.tsf格式文件。
原始数据集存在缺失值,本次已将缺失值直接替换为零。
本数据集的数据来源于Wikimedia REST API。根据该API的使用条款,本数据集采用CC-BY-SA 3.0与GFDL许可证进行授权。
创建时间:
2022-11-28



