Cora, Citeseer, CoAuthorCS, Polblogs and SBM

Name: Cora, Citeseer, CoAuthorCS, Polblogs and SBM
Creator: IEEE DataPort
Published: 2025-01-22 05:10:41
License: 暂无描述

DataCite Commons2025-01-22 更新2025-04-16 收录

下载链接：

https://ieee-dataport.org/documents/cora-citeseer-coauthorcs-polblogs-and-sbm

下载链接

链接失效反馈

官方服务：

资源简介：

1.Cora dataset is derived from a multi-group citation network, and the two-group subgraphs are selected for tasks such as graph neural network node classification. The dataset contains sparse Bag-of-Words feature vectors as node attributes, and the labels are mostly academic paper topic categories or fields. This subgraph focuses on the influence of graph structure and node characteristics on model prediction, which provides a reliable experimental benchmark for the research of multi-step adversarial attacks and defense strategies. Number of nodes: (example) 652 Edges: (example) 2350 Node feature dimension: 1433 Applicable tasks: node classification, adversarial attack, graph representation learning, etc 2.Citeseer is also derived from multi-group citation networks and is similar to Cora, but differs in node distribution and feature dimensions. In this double-group subgraph, the node attributes also use sparse bag-of-words feature vectors, and the labels are mostly research topics or directions of academic papers. Because the graph structure is relatively complex, and the node feature dimensions and the number of categories are different from Cora, this dataset is often used to compare and verify the generalization and robustness of graph neural network models. Number of nodes: (example) 852 Edges: (example) 3170 Node feature dimension: 3703 Applicable tasks: node classification, adversarial attack, citation network analysis, etc 3.CoAuthorCS comes from the two-group subgraph of the multi-group cooperation network, and each node represents the presence of keywords by a binary feature vector, which is suitable for studying the task of clustering or classification based on the presence or not of attributes. This dataset can highlight the association between node characteristics and cooperation relationships in academic networks, and provide experimental scenarios with more binary attribute characteristics for multi-step adversarial attack research. Number of nodes: (example) 836 Edges: (example) 2270 Node feature form: binary keyword vector Applicable tasks: node classification, cooperative relationship analysis, adversarial attack, etc 4.Polblogs is a real-world dataset that reflects a network of political blogs, with node labels corresponding to the political orientation of the blogs (e.g., liberal vs. conservative). The network structure of the dataset is usually large, and the edges represent the reference or link relationships between blogs. It is often used to analyze community division, public opinion diffusion, adversarial attacks, and so on. By treating node labels as binary classes (Liberal vs. Conservative), researchers can test the effectiveness of adversarial attacks and defense mechanisms in complex community structures. Number of nodes: (example) 1222 Edges: (example) 16714 Tag type: Liberal/Conservative Applicable tasks: node classification, community division, polarization research, adversarial attack, etc 5.Stochastic Block Model (SBM) is a commonly used stochastic graph model to simulate network data with community structure or block structure. The data set can be generated by random generation mechanism (such as setting the number of communities, edge probability, etc.), and the node label is often determined by the community it belongs to. SBM is often used to study community detection, group behavior simulation, and robustness under adversarial attacks, because of its high controllability and the ability to adjust the network size and structure according to requirements. Number of nodes: (example) 1490 Edges: (example) 13790 Label type: Community division based on synthesis Applicable tasks: community detection, random graph model research, adversarial attack simulation, etc

提供机构：

IEEE DataPort

创建时间：

2025-01-22

搜集汇总

背景与挑战

背景概述

该数据集是一个包含Cora、Citeseer、CoAuthorCS、Polblogs和SBM五个图数据集的集合，涵盖引用网络、合作网络、政治博客网络和合成随机网络等多种类型。这些数据集提供稀疏词袋特征向量、二进制特征向量和节点标签，适用于图神经网络节点分类、多步对抗攻击、社区检测等研究任务，旨在为图相关实验提供统一的基准测试平台。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集