five

The disruption index suffers from citation inflation and is confounded by shifts in scholarly citation practice: synthetic citation networks for bibliometric null models

收藏
DataONE2025-02-05 更新2025-04-26 收录
下载链接:
https://search.dataone.org/view/sha256:ad868eeb427a0a7c40c27b18f08fd1ac7d4d5af3f219f6a1008c9e79dcee19e3
下载链接
链接失效反馈
官方服务:
资源简介:
We demonstrate that the disruption index (CD) recently applied to publication and patent citation networks by Park et al. (Nature, 2023) systematically decreases over time due to secular growth in research and patent production, following two distinct mechanisms unrelated to innovation – the first structural and the second behavioral. The structural explanation follows from ‘citation inflation’ (CI) (Petersen et al., Research Policy, 2018), an inextricable feature of real citation networks. One driver of CI is the ever-increasing length of reference lists, which causes the CD index to systematically decrease. The behavioral explanation reflects shifts in scholarly citation practice (e.g. self-citation) that increase the rate of triadic closure in citation networks and confounds efforts to measure disruptive innovation using CD. Combined, these two mechanisms render CD unsuitable for cross-temporal analysis, and call into question the interpretations provided by Park et al., Enclosed data accompany the following publications: Alexander M. Petersen, Felber Arroyave, Fabio Pammolli (2025). The disruption index suffers from citation inflation: re-analysis of temporal CD trend and relationship with team size reveal discrepancies. J. Informetrics 19, 101605 (2025). DOI:10.1016/j.joi.2024.101605 Alexander M. Petersen, Felber Arroyave, Fabio Pammolli (2024). The disruption index is biased by citation inflation. Quantitative Science Studies 5, 936-953 (2024). DOI:10.1162/qss_a_00333 To summarize, enclosed are two types of data: 1) Empirical publication-level data accompanied by code (do-files) for running multi-variate regressions in STATA 2) Raw network data produced for 6 citation network scenarios. For each scenario, we include 4 synthetic networks each, for a total of 24 citation networks. Each citation network is comprised of 125270 nodes that were systematically added in cohorts, therefore representing a null model for evolving citation networks, and ther..., Enclosed code was developed using 1) STATA 13.0 and 2) Mathematica 13 software, both of which should be backwards compatible with newer software verions. The document README.pdf provides detailed descriptions of the enclosed data and code., # The disruption index suffers from citation inflation and is confounded by shifts in scholarly citation practice: synthetic citation networks for bibliometric null models [https://doi.org/10.6071/M3G674](https://doi.org/10.6071/M3G674) ## Description of the data and file structure Enclosed are two types of data: 1) Empirical publication-level data (tabular file format) accompanied by code (do-files) for running multi-variate regressions in STATA; 2)  Raw network data (sparse network representation format) produced for 6 citation network scenarios. For each scenario we include 4 synthetic networks each, for a total of 24 citation networks. Each citation network is comprised of 125270 nodes that were systematically added in cohorts, therefore representing null model for evolving citation networks, and thereby useful for benchmarking existing and new bibliometric measures. The data and code for 1) and 2) are organized into subfolders, the contents and functionality of which are descri...
创建时间:
2025-02-05
二维码
社区交流群
二维码
科研交流群
商业服务