Cora, Citeseer, CoAuthorCS, Polblogs and SBM
收藏DataCite Commons2025-01-22 更新2025-04-16 收录
下载链接:
https://ieee-dataport.org/documents/cora-citeseer-coauthorcs-polblogs-and-sbm
下载链接
链接失效反馈官方服务:
资源简介:
1.Cora dataset is derived from a multi-group citation network, and the two-group subgraphs are selected for tasks such as graph neural network node classification. The dataset contains sparse Bag-of-Words feature vectors as node attributes, and the labels are mostly academic paper topic categories or fields. This subgraph focuses on the influence of graph structure and node characteristics on model prediction, which provides a reliable experimental benchmark for the research of multi-step adversarial attacks and defense strategies.
Number of nodes: (example) 652
Edges: (example) 2350
Node feature dimension: 1433
Applicable tasks: node classification, adversarial attack, graph representation learning, etc
2.Citeseer is also derived from multi-group citation networks and is similar to Cora, but differs in node distribution and feature dimensions. In this double-group subgraph, the node attributes also use sparse bag-of-words feature vectors, and the labels are mostly research topics or directions of academic papers. Because the graph structure is relatively complex, and the node feature dimensions and the number of categories are different from Cora, this dataset is often used to compare and verify the generalization and robustness of graph neural network models.
Number of nodes: (example) 852
Edges: (example) 3170
Node feature dimension: 3703
Applicable tasks: node classification, adversarial attack, citation network analysis, etc
3.CoAuthorCS comes from the two-group subgraph of the multi-group cooperation network, and each node represents the presence of keywords by a binary feature vector, which is suitable for studying the task of clustering or classification based on the presence or not of attributes. This dataset can highlight the association between node characteristics and cooperation relationships in academic networks, and provide experimental scenarios with more binary attribute characteristics for multi-step adversarial attack research.
Number of nodes: (example) 836
Edges: (example) 2270
Node feature form: binary keyword vector
Applicable tasks: node classification, cooperative relationship analysis, adversarial attack, etc
4.Polblogs is a real-world dataset that reflects a network of political blogs, with node labels corresponding to the political orientation of the blogs (e.g., liberal vs. conservative). The network structure of the dataset is usually large, and the edges represent the reference or link relationships between blogs. It is often used to analyze community division, public opinion diffusion, adversarial attacks, and so on. By treating node labels as binary classes (Liberal vs. Conservative), researchers can test the effectiveness of adversarial attacks and defense mechanisms in complex community structures.
Number of nodes: (example) 1222
Edges: (example) 16714
Tag type: Liberal/Conservative
Applicable tasks: node classification, community division, polarization research, adversarial attack, etc
5.Stochastic Block Model (SBM) is a commonly used stochastic graph model to simulate network data with community structure or block structure. The data set can be generated by random generation mechanism (such as setting the number of communities, edge probability, etc.), and the node label is often determined by the community it belongs to. SBM is often used to study community detection, group behavior simulation, and robustness under adversarial attacks, because of its high controllability and the ability to adjust the network size and structure according to requirements.
Number of nodes: (example) 1490
Edges: (example) 13790
Label type: Community division based on synthesis
Applicable tasks: community detection, random graph model research, adversarial attack simulation, etc
提供机构:
IEEE DataPort
创建时间:
2025-01-22
搜集汇总
背景与挑战
背景概述
该数据集是一个包含Cora、Citeseer、CoAuthorCS、Polblogs和SBM五个图数据集的集合,涵盖引用网络、合作网络、政治博客网络和合成随机网络等多种类型。这些数据集提供稀疏词袋特征向量、二进制特征向量和节点标签,适用于图神经网络节点分类、多步对抗攻击、社区检测等研究任务,旨在为图相关实验提供统一的基准测试平台。
以上内容由遇见数据集搜集并总结生成



