five

Citation networks for understanding median research teams in idea flow with in a research community

收藏
DataCite Commons2020-09-01 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/dataset/NLP6seed/5422612
下载链接
链接失效反馈
官方服务:
资源简介:
<pre>The dataset contains 9 citations graphs obtained from the Web of Science. 3 graphs were generated for each of 3 research areas. 3 research areas are used, including SWF (Switched-capacitor Filters), NLP (Natural Language Processing), and RTT (Real-Time Tracking). For each area, 3 methods are used for generating the graph. Different methods use different "seed papers" are the starting point of getting the entire graph. Three settings of "seed papers" are used:6seed -- 6 very highly cited papers in the area that are hand picked and considered as having significant contributions to the community;100cit -- 100 top cited papers from search result for the research area at the time;100relev -- 100 most relavant papers from search result for the research area at the time The datasets are named after the following rule: [area][seed paper type]_[file type].csv eg. NLP6seed_arcs.csv -- Natural Language Processing, using 6 very highly cited papers as seed papers. The file contains the arc information. NLP6seed is the dataset used in the paper. We also provide the supporting data for all other 8 datasets. Detailed description of the files are given below: <br>- [dataset]_arcs.csv: the netlist of the citation graph- [dataset]_vertices.csv: the node information of the citation graph (paper name, author names, #citations, etc.)- [dataset]_cluster.xlsx: the group category information of the citation graph. Each column represents: GN-group leader name, GP-group publication, GS-group size, GC-group citation count, SC-self citation counts, CL-group category We also provide the following results:<br>- bc_[dataset].mat : betweenness change when removing categories B and/or C from the graph- mbc_[dataset].mat :modified betweenness change when removing categories B and/or C from the graph- sp_[dataset].mat : shortest path length change when removing categories B and/or C from the graph- msp_[dataset].mat : modified shortest path length change when removing categories B and/or C from the graph <br></pre>

本数据集包含源自Web of Science的9幅引文网络图。本次针对3个研究领域各生成3幅网络图,所选的3个研究领域分别为:开关电容滤波器(SWF, Switched-capacitor Filters)、自然语言处理(NLP, Natural Language Processing)以及实时跟踪(RTT, Real-Time Tracking)。每个领域采用3种不同方法生成引文网络图,不同方法以不同的“种子论文”作为获取完整网络图的起始节点。本次设置了3类“种子论文”选取方案:6seed——该领域内经人工遴选、被公认为对学界具有重要贡献的6篇高被引论文;100cit——彼时该领域搜索结果中排名前100的高被引论文;100relev——彼时该领域搜索结果中排名前100的相关度最高的论文。 数据集命名遵循如下格式:[领域][种子论文类型]_[文件类型].csv。例如:NLP6seed_arcs.csv 代表以自然语言处理领域的6篇高被引论文作为种子论文生成的数据集,其文件包含网络图的边信息,其中NLP6seed为本论文所使用的数据集。我们同时提供了其余8个数据集的配套数据。各文件的详细说明如下: - [dataset]_arcs.csv:引文网络图的边列表 - [dataset]_vertices.csv:引文网络图的节点信息(包含论文名称、作者姓名、被引频次等) - [dataset]_cluster.xlsx:引文网络图的群组分类信息。各列含义分别为:GN-群组负责人姓名、GP-群组发表文献数、GS-群组规模、GC-群组总被引频次、SC-自引频次、CL-群组分类标签 本次还提供了以下分析结果文件: - bc_[dataset].mat:从网络图中移除类别B和/或类别C时的介数中心性变化数据 - mbc_[dataset].mat:从网络图中移除类别B和/或类别C时的修正介数中心性变化数据 - sp_[dataset].mat:从网络图中移除类别B和/或类别C时的最短路径长度变化数据 - msp_[dataset].mat:从网络图中移除类别B和/或类别C时的修正最短路径长度变化数据
提供机构:
figshare
创建时间:
2017-09-20
二维码
社区交流群
二维码
科研交流群
商业服务