WenWW/KD_fineweb_sub
收藏Hugging Face2024-12-03 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/WenWW/KD_fineweb_sub
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含文本、ID和元数据,其中元数据包括日期、转储信息、文件路径、整数评分、语言、语言评分、评分、令牌计数和URL。数据集被分割为训练集,包含大量字节和示例。
The dataset contains text and unique identifiers (id), along with detailed metadata such as date, file path, language, scores, etc. The dataset is split into a training set with 4,285,714 samples, totaling 21,776,985,868 bytes.
提供机构:
WenWW



