eot2024_hostlevel_logs
收藏huggingface.co2025-03-24 收录
下载链接:
https://huggingface.co/datasets/commoncrawl/eot2024_hostlevel_logs
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is a host-level summary of the initial crawl logs for the End of Term 2024 dataset.
Since this project will not finish until January 2025, please do not ask for access unless you
are directly involved in this effort. Organizations involved are the Library of Congress, the
Internet Archive, the University of North Texas Libraries, Stanford University Libraries, the
US Government Publishing Office, the US National Archives, and the Common Crawl Foundation.
本数据集为2024年期末数据集的初始爬取日志在主机层面的汇总。鉴于该项目预计直至2025年1月方能完成,除非您直接参与其中,否则请勿申请访问。参与此项目的机构包括美国国会图书馆、互联网档案馆、北德克萨斯大学图书馆、斯坦福大学图书馆、美国政府出版办公室、美国国家档案馆以及Common Crawl基金会。
提供机构:
huggingface.co



