thesven/Wikipedia-EN-Small-CPT
收藏Hugging Face2024-08-09 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/thesven/Wikipedia-EN-Small-CPT
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含1,492,542个样本,每个样本具有id、url、title、text和token_length五个特征。数据类型分别为字符串、字符串、字符串、字符串和64位整数。数据集总大小为609,537,511字节,下载大小为264,539,842字节。数据集仅包含一个训练分割。
The dataset contains 1,492,542 samples, each with five features: id, url, title, text, and token_length. The data types are string, string, string, string, and int64, respectively. The total size of the dataset is 609,537,511 bytes, and the download size is 264,539,842 bytes. The dataset includes only a training split.
提供机构:
thesven



