five

DKS Datasets and Source Codes

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/wfbnhy5gvw
下载链接
链接失效反馈
官方服务:
资源简介:
1. Five shells for running DKS on five standard real-world datasets: shell-citeseer.py, shell-cornell.py, shell-wisconsin.py, shell-toy.py, and shell-video.py. 2. Four source codes, including DKS code, contaminated graph generation code, incomplete graph generation code, and query generation code. 3. Five standard real-world datasets from various domains: (1) "CiteSeer" is a standard citation network dataset, where nodes represent documents, edges represent citation links, and keywords are the bag-of-words representation of papers. (2) "Cornell" and "Wisconsin" are two subdatasets of a webpage dataset collected from computer science departments of various universities, where nodes denote web pages, edges denote hyperlinks between nodes, and keywords are the bag-of-words representation of web pages. (3) "Toy" and "Video" are co-purchase networks. Their nodes denote the products, and the keywords are features of the product. An edge is built if two products are purchased by one customer. 4. Three extended datasets: (1) "Pubmed" is another standard citation network dataset, where nodes represent documents, edges represent citation links, and keywords are the bag-of-words representation of papers. (2) "Chameleon" and "Squirrel" are very dense heterogeneous knowledge-graph style datasets. In both Chameleon and Squirrel, nodes represent Wikipedia entries, edges represent links between entries, and keywords are descriptive terms for the entries.

1. 用于在5个标准真实世界数据集上运行DKS的5个Shell脚本文件,分别为shell-citeseer.py、shell-cornell.py、shell-wisconsin.py、shell-toy.py以及shell-video.py。 2. 共包含4份源代码,涵盖DKS代码、污染图生成代码、不完整图生成代码以及查询生成代码。 3. 来自多个领域的5个标准真实世界数据集: (1) CiteSeer(CiteSeer)是标准引文网络数据集,其节点代表文献,边代表引用链接,关键词采用论文的词袋表示(bag-of-words)。 (2) Cornell(Cornell)与Wisconsin(Wisconsin)是从多所高校计算机科学系收集的网页数据集中的两个子数据集,其节点代表网页,边代表节点间的超链接,关键词采用网页的词袋表示(bag-of-words)。 (3) Toy(Toy)与Video(Video)为共同购买网络,其节点代表商品,关键词为商品的特征,若两件商品被同一顾客购买,则二者之间存在边。 4. 3个扩展数据集: (1) PubMed(PubMed)是另一个标准引文网络数据集,其节点代表文献,边代表引用链接,关键词采用论文的词袋表示(bag-of-words)。 (2) Chameleon(Chameleon)与Squirrel(Squirrel)为高密度异质知识图谱风格数据集。在这两个数据集中,节点均代表维基百科条目,边代表条目间的链接,关键词为条目的描述性术语。
创建时间:
2025-11-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作