five

claran/seed-pretrain-decon-parquet

收藏
Hugging Face2024-10-19 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/claran/seed-pretrain-decon-parquet
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含两个配置:gutenberg和wikipedia。gutenberg配置包含68319个训练示例,总大小为25032585417字节,下载大小为15328016129字节。wikipedia配置包含6543381个训练示例,总大小为16769628203字节,下载大小为9557680471字节。每个配置的特征包括added、created、id、metadata等字段,metadata字段进一步包含多个子字段,如gutenberg_metadata_available、issued_or_updated_available、length等。

The dataset contains two configurations: gutenberg and wikipedia. The gutenberg configuration includes 68,319 training examples with a total size of 25,032,585,417 bytes and a download size of 15,328,016,129 bytes. The wikipedia configuration includes 6,543,381 training examples with a total size of 16,769,628,203 bytes and a download size of 9,557,680,471 bytes. Each configurations features include fields such as added, created, id, metadata, etc., with the metadata field further containing subfields like gutenberg_metadata_available, issued_or_updated_available, length, etc.
提供机构:
claran
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作