The disruption index suffers from citation inflation and is confounded by shifts in scholarly citation practice: synthetic citation networks for bibliometric null models
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.6071%252FM3G674
下载链接
链接失效反馈官方服务:
资源简介:
We demonstrate that the disruption index (CD) recently applied to publication and patent citation networks by Park et al. (Nature, 2023) systematically decreases over time due to secular growth in research and patent production, following two distinct mechanisms unrelated to innovation – the first structural and the second behavioral. The structural explanation follows from ‘citation inflation’ (CI) (Petersen et al., Research Policy, 2018), an inextricable feature of real citation networks. One driver of CI is the ever-increasing length of reference lists, which causes the CD index to systematically decrease. The behavioral explanation reflects shifts in scholarly citation practice (e.g. self-citation) that increase the rate of triadic closure in citation networks and confounds efforts to measure disruptive innovation using CD. Combined, these two mechanisms render CD unsuitable for cross-temporal analysis, and call into question the interpretations provided by Park et al.
Methods
Enclosed data accompany the following publications:
Alexander M. Petersen, Felber Arroyave, Fabio Pammolli (2025). The disruption index suffers from citation inflation: re-analysis of temporal CD trend and relationship with team size reveal discrepancies. J. Informetrics 19, 101605 (2025). DOI:10.1016/j.joi.2024.101605
Alexander M. Petersen, Felber Arroyave, Fabio Pammolli (2024). The disruption index is biased by citation inflation. Quantitative Science Studies 5, 936-953 (2024). DOI:10.1162/qss_a_00333
To summarize, enclosed are two types of data:
1) Empirical publication-level data accompanied by code (do-files) for running multi-variate regressions in STATA
2) Raw network data produced for 6 citation network scenarios. For each scenario, we include 4 synthetic networks each, for a total of 24 citation networks. Each citation network is comprised of 125270 nodes that were systematically added in cohorts, therefore representing a null model for evolving citation networks, and thereby useful for benchmarking existing and new bibliometric measures. These data were generated using a synthetic citation network model developed and reported in:
Pan, R. K., Petersen, A. M., Pammolli, F. & Fortunato, S. The memory of science: Inflation, myopia, and the knowledge network. Journal of Informetrics 12, 656–678 (2018).
创建时间:
2025-02-05



