JohnLyu/cc_main_2024_51_links_pdf_url
收藏Hugging Face2025-03-18 更新2025-08-30 收录
下载链接:
https://hf-mirror.com/datasets/JohnLyu/cc_main_2024_51_links_pdf_url
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含用户ID、网址、文本内容、文件名和页面网址等字段信息,适用于训练自然语言处理模型。训练集包含约2.8亿个样本,数据集总大小约为94.6GB。
The dataset includes fields such as user ID, URL, text content, filename, and page URL, which are suitable for training natural language processing models. The training set contains approximately 280 million samples, with a total dataset size of about 94.6GB.
提供机构:
JohnLyu



