Graph datasets for clustering
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/fzjyprkh3h
下载链接
链接失效反馈官方服务:
资源简介:
The CORA dataset consists of seven distinct categories of scientific papers. It comprises 2708 papers, with each paper represented as a node in the network. There are 5429 citation links, each representing a directed edge from one paper (node) to another, indicating a citation relationship. Each paper is represented by a 1433-dimensional feature vector, where each value is 0 or 1, indicating the absence or presence of specific words from a predefined dictionary.
CITE is a citation network dataset consisting of papers from six distinct research categories: Agents, Artificial Intelligence (AI), Databases (DB), Information Retrieval (IR), Machine Learning (ML), and Human-Computer Interaction (HCI). The dataset comprises 3327 academic papers. Each paper is represented by a 3703-dimensional word vector, indicating the absence or presence of specific words from a predefined dictionary. Additionally, the dataset includes 4732 citation links between papers, reflecting the citation relationships among papers.
The DBLP dataset is derived from the DBLP computer science bibliography and represents a co-authorship network. Each node corresponds to an author, and an edge between two nodes indicates that the corresponding authors have co-authored at least one paper together. It contains 4058 nodes and 3528 edges, with each author represented by an 334-dimensional feature vector that describes their research areas.
The ACM dataset is a paper network, derived from the ACM database. It contains a total of 3025 papers categorized into three categories: database, wireless communication, and data mining. Each paper is represented by a 1870-dimensional vector based on the research area of the article. There is an edge between two papers if they are written by the same author.
创建时间:
2024-06-20



