five

Web Data Commons - Hyperlink Graphs

收藏
IEEE2020-08-04 更新2026-04-17 收录
下载链接:
https://ieee-dataport.org/open-access/web-data-commons-hyperlink-graphs
下载链接
链接失效反馈
官方服务:
资源简介:
The graphs have been extracted from the 2012 and 2014 versions of the Common Crawl web corpera. The 2012 graph covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, the graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. The2014 graph covers 1.7 billion web pages connected by 64 billion hyperlinks. Below we provide instructions on how to download the graphs as well as basic statistics about their topology.We hope that the graphs will be useful for researchers who developsearch algorithms that rank results based on the hyperlinks between pages.SPAM detection methods which identity networks of web pages that are published in order to trick search engines.graph analysis algorithms and can use the hyperlink graphs for testing the scalability and performance of their tools.Web Science researchers who want to analyze the linking patterns within specific topical domains in order to identify the social mechanisms that govern these domains.
提供机构:
Outman, Alexander
创建时间:
2020-08-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作