KathirKs/CC-MAIN-2016-22_row_wise_20240907_040322
收藏Hugging Face2024-09-07 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/KathirKs/CC-MAIN-2016-22_row_wise_20240907_040322
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含文本数据、唯一标识符和元数据。文本数据以字符串序列的形式存储,元数据包括dump、file_path、id和url四个字段,均为字符串类型。数据集仅包含一个训练集,共有1,741,330个样本,总大小为22,594,450,822字节。下载大小为8,503,412,274字节。数据集的配置文件名为default,数据文件路径为data/train-*。
The dataset includes text sequences, unique identifiers, and metadata. The metadata consists of data dump, file path, identifier, and URL. The dataset is split into a training set with 1741330 examples, totaling 22594450822 bytes. The download size of the dataset is 8503412274 bytes.
提供机构:
KathirKs



