ComplexDataLab/CrediBench
收藏Hugging Face2026-01-01 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/ComplexDataLab/CrediBench
下载链接
链接失效反馈官方服务:
资源简介:
CrediBench 1.1 是一个大规模的时序网页图,由 Common Crawl 提供的网页数据构成。该数据集包含每月的网页网络切片,每个网页图包含超过10亿条边和4500万个节点。节点代表网站域名,边代表有向的超链接关系。数据集还包含从 Common Crawl 和网页抓取得到的文本属性,以及由 Lin 等人提供的可信度评分,用于监督学习和半监督学习。该数据集旨在用于网络上的虚假信息检测研究,特别是用于开发基于空间和时间线索的不可靠域名检测方法。
CrediBench 1.1 is a large-scale, temporal webgraph constituted of web data pulled from Common Crawl. This dataset contains monthly slices of web networks, each with over 1 billion edges and 45 million nodes. Nodes represent website domains, and edges represent directed hyperlink relations. The dataset is supplemented with text attributes from Common Crawl and web scraping, as well as credibility scores provided by Lin et al., for supervised and semi-supervised learning. It is intended for research efforts against misinformation online, specifically for developing methods for unreliable domain detection based on spatio-temporal cues.
提供机构:
ComplexDataLab



