Extracted external domains from Wikipedia dump - 20/03/2024
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/11076686
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains 6,459,779 distinct domains derived from the external links section of Wikipedia pages.
The external links section of a page such as OpenWeb contains only one link.
The primary objective of assembling this dataset is to improve content prioritization and filtering in web crawling techniques.
The dataset is structured as a text file, with each line representing a distinct domain.
创建时间:
2024-04-30



