Research institutions clustering based on the intensity of academic collaboration

NIAID Data Ecosystem2026-03-11 收录

下载链接：

https://zenodo.org/record/2616697

下载链接

链接失效反馈

官方服务：

资源简介：

The clustering of research institutions has been conducted using the Louvain modularity algorithm. The Louvain modularity is a state-of-the-art method of identifying communities (clusters) in large networks. Modularity is a value between -1 and 1 that measures the density of edges inside communities to edges outside communities. Optimizing this value results in the best possible grouping of the nodes of a given network. In our exercise, the Louvain methods were applied to identify clusters of institutions within ACM and SSRN networks. In the network, nodes are constituted of institutions, and edges are represented by the intensity of research collaboration measured by number of papers co-authored by authors affiliated with the institutions. As an example, including the paper: Fast unfolding of communities in large networks, written by V. D. Blondel (Universite Catholique de Louvain), J-L. Guillaume (Universite Pierre et Marie Curie), R. Lambiotte (Imperial College London) and Etienne Lefebvre (Universite Catholique de Louvain) would impact the number of edges in our analysis in the following way: “Universite catholique de Louvain” ⇔ “Imperial College London” =+1 “Universite catholique de Louvain” ⇔ “Universite Pierre et Marie Curie” =+1 “Imperial College London” ⇔ “Universite Pierre et Marie Curie” =+1 In our largest network we analyse 5362 institution nodes with 147 482 edges. The number of identified clusters highly depends on the resolution parameter. Resolution is a parameter for the Louvain community detection algorithm that affects the size of the recovered clusters. Smaller resolutions recover smaller, and therefore a larger number of clusters, and conversely, larger values recover clusters containing more data points. In all clusterizations, we have used a default resolution (1.0) tuned in the popular Gephi software for network analysis. Resolutions equal to one result in a moderate number of clusters, characterised by satisfactory statistical distribution. Source: - Association for Computing Machinery (ACM) Characteristics of the ACM Data Set following geographical classification Number of institutions: 5477 Number of papers: 674684 Number of countries: 122 Years: 2011-2018 As ACM contains publications across various areas of computer science, a more in-depth analysis requires the classification of papers into fields of interests. During the analysis, we looked at 3 wide areas: Artificial intelligence and machine learning Technology (hardware, emerging technologies, infrastructure) Social issues The 3 categories were set following expert analysis of the 1000 most frequent keywords in the dataset. If a term from the following list appeared among the paper’s keywords, the paper was assigned to that group, allowing a paper to assign to more than one group. Files: mod_ai.csv (based on keywords related to artificial intelligence) mod_tech.csv (based on keywords related to technologies) mod_soc.csv (based on keywords related to social issues) mod_all.csv (based on all papers)

创建时间：

2020-01-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集