five

Data for "Modeling the Global Citation Network using the Scalable Agent-based Simulator for Citation Analysis with Recency-emphasized Sampling (SASCA-ReS)"

收藏
DataCite Commons2025-12-09 更新2026-05-03 收录
下载链接:
https://databank.illinois.edu/datasets/IDB-9265079
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset principally consists of four synthetic citation networks that were generated during the preparation of the manuscript Park M, Yi H, Warnow T, and Chacko G (2025). Modeling the Global Citation Network using the Scalable Agent-based Simulator for Citation Analysis with Recency-emphasized Sampling (SASCA-ReS). A preprint is available on Zenodo (below) and the manuscript has been submitted to the MetaRoR platform for review and feedback. @misc{park_2025_17789558, author = {Park, Minhyuk and Yi, Haotian and Warnow, Tandy and Chacko, George}, title = {Modeling the Global Citation Network using the Scalable Agent-based Simulator for Citation Analysis with Recency-emphasized Sampling (SASCA- ReS) }, month = dec, year = 2025, publisher = {Zenodo}, doi = {10.5281/zenodo.17789558}, url = {https://doi.org/10.5281/zenodo.17789558}, } The networks are roughly 14, 76, 161, and 218 million nodes each. Both nodelists with attributes and edge lists are provided as gzipped parquet files along with the configuration file that was passed to the SASCA-ReS software, which can be accessed at: https://github.com/illinois-or-research-analytics/SASCA-ReS. A copy of the configuration file that was used to generate the network with SASCA-ReS is also provided. For example: abm14_config.ini; abm14_edgelist.parquet.gz; and abm14_nodelist.parquet.gz. The column headers in the edgelists and nodelists and the fields in the configuration file are explained in the Github repository for SASCA-ReS. In addition, we provide sj_reccount, a table of real world citation frequencies that is an input to the SASCA-Res software. The first column (diff) of sj_reccount lists the difference between the publication year of a citing document and the publication year of a cited document. The second column (count) reports the frequency of such citations across the dataset of 77879427 observations, which is derived from the biomedical literature. Finally, we share data, composite_maverick_disruption.csv , from the mavericks (unconventional citing strategies) experiment reported in the Park et al. (2025) manuscript available at https://zenodo.org/records/17772113. The columns in the composite_maverick_disruption.csv file are: node_id -> of agents in the various simulations n_i, n_j, n_k -> terms used to compute disruption per "Wu, L., Wang, D. & Evans, J.A. Large teams develop and small teams disrupt science and technology. Nature 566, 378–382 (2019). https://doi.org/10.1038/s41586-019-0941-9" disruption -> the disruption metric of Wu, Wang, and Evans (2019) type -> maverick type (maximizer, randomnik, or minimizer) year -> virtual year in the simulation when the maverick was created alpha -> the alpha parameter of the control agent pa_weight -> the preferential attachment weight of the control agent phenotype fit_peak_value -> the fitness value assigned to the control agent in_degree -> the count of citations accumulated by the maverick or control agent at the end of the simulation out_degree -> the count of references made by the maverick tag -> a label for the experiment, e.g. od249_f1 indicates that the mavericks in this experiment made 249 citations and were assigned a fitness value of 1.
提供机构:
University of Illinois Urbana-Champaign
创建时间:
2025-12-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作