Graph datasets for clustering

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://data.mendeley.com/datasets/fzjyprkh3h

下载链接

链接失效反馈

官方服务：

资源简介：

The CORA dataset consists of seven distinct categories of scientific papers. It comprises 2708 papers, with each paper represented as a node in the network. There are 5429 citation links, each representing a directed edge from one paper (node) to another, indicating a citation relationship. Each paper is represented by a 1433-dimensional feature vector, where each value is 0 or 1, indicating the absence or presence of specific words from a predefined dictionary. CITE is a citation network dataset consisting of papers from six distinct research categories: Agents, Artificial Intelligence (AI), Databases (DB), Information Retrieval (IR), Machine Learning (ML), and Human-Computer Interaction (HCI). The dataset comprises 3327 academic papers. Each paper is represented by a 3703-dimensional word vector, indicating the absence or presence of specific words from a predefined dictionary. Additionally, the dataset includes 4732 citation links between papers, reflecting the citation relationships among papers. The DBLP dataset is derived from the DBLP computer science bibliography and represents a co-authorship network. Each node corresponds to an author, and an edge between two nodes indicates that the corresponding authors have co-authored at least one paper together. It contains 4058 nodes and 3528 edges, with each author represented by an 334-dimensional feature vector that describes their research areas. The ACM dataset is a paper network, derived from the ACM database. It contains a total of 3025 papers categorized into three categories: database, wireless communication, and data mining. Each paper is represented by a 1870-dimensional vector based on the research area of the article. There is an edge between two papers if they are written by the same author.

创建时间：

2024-06-20

5,000+

优质数据集

54 个

任务类型

进入经典数据集