lawfareblog.csv.gz
收藏数据集概述
数据集名称
Pagerank Project
数据集描述
该数据集包含两个文件,用于存储示例“网页图”数据。这些文件用于创建一个简单的搜索引擎,针对网站https://www.lawfareblog.com,该网站提供关于美国国家安全问题的法律分析。
数据文件
-
small.csv.gz
-
描述:包含来自Deeper Inside Pagerank论文的示例图。
-
内容:一个小的图,存储为CSV文件,包含节点和边的信息。
-
示例:
source,target 1,2 1,3 3,1 3,2 3,5 4,5 4,6 5,6 5,4 6,4
-
-
lawfareblog.csv.gz
-
描述:包含lawfare博客的链接结构。
-
内容:存储为CSV文件,包含节点和边的信息,节点名称为URL。
-
示例:
source,target www.lawfareblog.com/,www.lawfareblog.com/topic/interrogation www.lawfareblog.com/,www.lawfareblog.com/upcoming-events www.lawfareblog.com/,www.lawfareblog.com/ www.lawfareblog.com/,www.lawfareblog.com/our-comments-policy www.lawfareblog.com/,www.lawfareblog.com/litigation-documents-related-appointment-matthew-whitaker-acting-attorney-general www.lawfareblog.com/,www.lawfareblog.com/topic/lawfare-research-paper-series www.lawfareblog.com/,www.lawfareblog.com/topic/book-reviews www.lawfareblog.com/,www.lawfareblog.com/documents-related-mueller-investigation www.lawfareblog.com/,www.lawfareblog.com/topic/international-law-loac
-
数据集统计信息
- 总链接数:1610789
- 节点数:25761
- 稀疏度:0.0024274297384360172
数据集用途
该数据集用于计算网页的PageRank值,并支持基于关键字的搜索查询。通过调整参数,如--filter_ratio和--alpha,可以优化搜索结果的质量和计算效率。




