Data from development and evaluation of SASCA-s: Scalable Agent-based Simulator for Citation Analysis with simulation
收藏DataCite Commons2025-09-08 更新2026-05-03 收录
下载链接:
https://databank.illinois.edu/datasets/IDB-3926377
下载链接
链接失效反馈官方服务:
资源简介:
The data within consist of compressed output files in the form of edgelists (*.edgelist.gz) and nodelists (*.aux.parquet) from large citation network simulations using an agent-based model. The code and instructions are available at: https://github.com/illinois-or-research-analytics/SASCA. In addition, we provide a distribution of citation frequencies drawn from a random sample of PubMed journal articles (pooled_50k_pubmed_unique.csv) and a table of recencies- the frequency with which citations are made to the previous year, the year before that and so on (recency_probs_percent_stahl_filled.csv). A manuscript describing the SASCA-s simulator has been submitted for review and will be referenced in a future version of this data repository if it is accepted. The prefixes sj and er refer to the real world and Erdos-Renyi random graph respectively that were used to initiate simulations. These 'seed' networks are available from the Github site referenced above.
本数据集包含采用智能体模型(agent-based model)开展的大型引文网络模拟所生成的压缩输出文件,格式涵盖边列表(edgelist,文件后缀为*.edgelist.gz)与节点列表(nodelist,文件后缀为*.aux.parquet)。相关代码与使用说明可通过以下链接获取:https://github.com/illinois-or-research-analytics/SASCA。此外,本数据集还提供两项附加资源:一是从PubMed期刊文献随机样本中提取的引文频率分布文件(pooled_50k_pubmed_unique.csv);二是引文时效性统计表——该表统计了针对前一年、前两年及更早年份文献的引用频次(recency_probs_percent_stahl_filled.csv)。一篇介绍SASCA-s模拟器的学术论文已提交同行评审,若论文得以录用,将在本数据集仓库的后续版本中补充引用信息。前缀sj与er分别指代用于启动模拟的真实世界网络与Erdos-Renyi随机图(Erdos-Renyi random graph)。上述两类“种子”网络均可从前文提及的GitHub站点获取。
提供机构:
University of Illinois Urbana-Champaign
创建时间:
2025-08-16



