Architectural styles of curiosity in global Wikipedia mobile app readership
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/records/13922132
下载链接
链接失效反馈官方服务:
资源简介:
Description of the data and file structure
These directories contain the code, aggregated data, and preprocessing scripts to re-create the figures in "Architectural styles of curiosity in global Wikipedia readership"
Files and variables
File: Archive.zip
Figures: Publically usable illustrations are available here.
Description: Contains analysis, data, preprocessing, results, and utils folders. 14 directories, 25 files.
|-- analysis| |-- KNOT_analysis.R <- analyzes laboratory data| |-- analyze_1000-networks.ipynb <- analyzes naturalistic data| |-- analyze_1000-networks_comparison-knot-rw_clean.ipynb <- compares datasets and nulls| |-- analyze_KNOT_networks.ipynb <- analyzes laboratory data| |-- analyze_forward_flow.ipynb <- calculates forward flow| |-- forest_plots.R <- correlations wtih sociodemographic variables| |-- topic_analysis.R <- analysis of topic and information diversity| `-- worldmap.R <- visualization of geographical data sources|-- data| |-- laboratory_data <- variables for laboratory browsing and survey data| |-- mobile_app_data <- aggregated data for network structure and topic (rows are individuals)| |-- pretrained_embeddings <- fastText word embeddings| |-- spatial_navigation <- data from Sea Hero Quest| |-- surveys <- data from nationally aggregated sociodemographic surveys| `-- wikispeedia <- data from WikiSpeedia game|-- preprocessing| |-- data_knowledge-networks_generate-subsample_clean.ipynb <- processes mobile app data| |-- data_knowledge-networks_metrics-combined_clean.ipynb <- calculates network metrics| |-- data_knowledge-networks_rw_get-data.ipynb <- calculates null networks| `-- data_knowledge-networks_sessions-app_cleaned.ipynb <- processes individual browsing|-- requirements.txt|-- results| |-- UMAP <- data used to generate network embedding (rows are individuals)| `-- figs <- code for generated figure on forward flow `-- utils |-- plot_knowledge-networks_network-comparison.ipynb <- visualizations of network comparisons |-- plot_knowledge-networks_network-metrics_distance.ipynb <- visualizations of distance between datasets |-- plot_knowledge-networks_summary-stats.ipynb <- visualizations of summary stats |-- utils_embedding.py <- get word and document embeddings |-- utils_filtration_metrics.py <- higher-order topology functions (unused) |-- utils_gt.py <- graph-tool functions |-- utils_network.py <- functions to make networks from series of article IDs |-- utils_network_metrics.py <- network metrics |-- utils_networkx.py <- networkx functions |-- utils_rw.py <- functions to generate random walks and null models `-- utils_tokenizer.py <- functions for processing embeddings.
Code/software
See requirements.txt
Access information
Other publicly accessible locations of the data:
* https://gitlab.wikimedia.org/repos/research/curiosity
Additional data was derived from the following sources:
* [Human Development Index]* [World Happiness Report]* [WikiSpeedia]* [FastText]* [Sea Hero Quest]* [Knowledge Networks Over Time Study]
数据与文件结构说明
本目录组包含用于复现《全球维基百科阅读受众的好奇心架构风格》一文中图表的代码、整合后数据集与预处理脚本。
文件与变量说明
文件:Archive.zip
图表:此处可获取可公开使用的插图。
说明:本数据集包含分析、数据、预处理、结果与工具函数五个文件夹,共计14个目录与25个文件。
|-- analysis
| |-- KNOT_analysis.R <- 实验室数据分析脚本
| |-- analyze_1000-networks.ipynb <- 自然场景数据分析脚本
| |-- analyze_1000-networks_comparison-knot-rw_clean.ipynb <- 数据集与零模型对比脚本
| |-- analyze_KNOT_networks.ipynb <- 实验室数据分析脚本
| |-- analyze_forward_flow.ipynb <- 前向流量计算脚本
| |-- forest_plots.R <- 社会人口学变量相关性分析脚本
| |-- topic_analysis.R <- 主题与信息多样性分析脚本
| `-- worldmap.R <- 地理数据源可视化脚本
|-- data
| |-- laboratory_data <- 实验室浏览与调查数据变量集
| |-- mobile_app_data <- 网络结构与主题整合数据(行代表个体)
| |-- pretrained_embeddings <- 预训练词嵌入(fastText)
| |-- spatial_navigation <- 《Sea Hero Quest》游戏数据
| |-- surveys <- 全国汇总社会人口学调查数据
| `-- wikispeedia <- WikiSpeedia游戏数据
|-- preprocessing
| |-- data_knowledge-networks_generate-subsample_clean.ipynb <- 移动应用数据处理脚本
| |-- data_knowledge-networks_metrics-combined_clean.ipynb <- 网络指标计算脚本
| |-- data_knowledge-networks_rw_get-data.ipynb <- 零模型网络计算脚本
| `-- data_knowledge-networks_sessions-app_cleaned.ipynb <- 个体浏览数据处理脚本
|-- requirements.txt
|-- results
| |-- UMAP <- 用于生成网络嵌入的数据(行代表个体)
| `-- figs <- 前向流量生成图表的代码
`-- utils
|-- plot_knowledge-networks_network-comparison.ipynb <- 网络对比可视化脚本
|-- plot_knowledge-networks_network-metrics_distance.ipynb <- 数据集间距离可视化脚本
|-- plot_knowledge-networks_summary-stats.ipynb <- 汇总统计量可视化脚本
|-- utils_embedding.py <- 词与文档嵌入获取工具
|-- utils_filtration_metrics.py <- 高阶拓扑函数(未使用)
|-- utils_gt.py <- Graph-Tool工具函数
|-- utils_network.py <- 基于文章ID序列构建网络的函数
|-- utils_network_metrics.py <- 网络指标工具函数
|-- utils_networkx.py <- NetworkX工具函数
|-- utils_rw.py <- 随机游走与零模型生成工具函数
`-- utils_tokenizer.py <- 嵌入处理工具函数
代码与软件:详见requirements.txt文件。
访问信息
本数据集的其他公开访问地址:
* https://gitlab.wikimedia.org/repos/research/curiosity
本数据集的附加数据来源于以下来源:
* 人类发展指数(Human Development Index)
* 世界幸福报告(World Happiness Report)
* WikiSpeedia(WikiSpeedia)
* FastText(FastText)
* 《Sea Hero Quest》游戏
* 知识网络随时间变化研究(Knowledge Networks Over Time Study)
创建时间:
2025-03-16



