Academic co-word network link prediction dataset
收藏科学数据银行2023-05-05 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=c6facf07dfd04977901bea24df6b27f6
下载链接
链接失效反馈官方服务:
资源简介:
Using the Web of Science core collection as the source of literature data. Five disciplines with different data scales were randomly selected from the Web of Science core collection discipline category catalog, namely Information Science&Library Science (ISLS), Law (LAW), Biomedical Social Sciences (BSS), Communication (COM), and Oceanography (Ocean). When conducting literature search, the "WC=subject name" search method is used to restrict the publication year to 2015, 2016, 2017, 2018, 2019, and 2020. Use the literature from 2015 to 2018 as the training stage for each co word network, and use the literature from 2019 to 2020 as the testing stage N2. On this basis, the extraction of keywords in the training and testing stages is carried out separately. The keywords that appear in both the training and testing stages need to be selected as nodes for building a co word network. Therefore, the intersection of the keyword sets in the training and testing stages is further processed to obtain the necessary keyword information for building a co word network for each discipline; After experimental analysis, keyword frequencies greater than 4, 6, 8, 10, 12, and 14 were extracted to construct a co word network with different topological structure features.
本数据集以Web of Science核心合集(Web of Science Core Collection)作为文献数据来源。从该合集的学科分类目录中随机选取5个数据规模各异的学科,分别为情报学与图书馆学(Information Science&Library Science, ISLS)、法学(Law, LAW)、生物医学社会科学(Biomedical Social Sciences, BSS)、传播学(Communication, COM)以及海洋学(Oceanography, Ocean)。文献检索阶段采用“WC=学科名称”的检索方式,限定文献出版年份为2015至2020年。将2015-2018年的文献作为各共词网络(co-word network)的训练阶段数据集,2019-2020年的文献作为测试阶段(记为N2)数据集。在此基础上,分别提取训练阶段与测试阶段的关键词,并筛选出同时在两个阶段均出现的关键词作为共词网络的构建节点。进一步对训练与测试阶段的关键词集取交集,得到各学科构建共词网络所需的核心关键词信息。经实验分析,分别提取频次大于4、6、8、10、12、14的关键词,以此构建具备不同拓扑结构特征的共词网络。
提供机构:
Southwest University; Sichuan University
创建时间:
2023-04-25



