carsondial/fineweb-arctic
收藏Hugging Face2025-10-30 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/carsondial/fineweb-arctic
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含三个特征字段:id(字符串类型)、text(字符串类型)和embedding(浮点数序列)。数据集被划分为训练集(train),其中包含300万个示例,占用了23,498,835,119字节。整个数据集的下载大小为18,371,017,208字节,总大小为23,498,835,119字节。默认配置下,训练数据存储在data/train-*路径下。
The dataset includes three feature fields: id (string type), text (string type), and embedding (floating-point sequence). The dataset is split into a training set (train) containing 3,000,000 examples, occupying 23,498,835,119 bytes. The total download size of the dataset is 18,371,017,208 bytes, and the total size is 23,498,835,119 bytes. Under the default configuration, the training data is stored in the path data/train-*.
提供机构:
carsondial



