five

orionweller/dolma_20bn_cc_high_quality

收藏
Hugging Face2024-06-12 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/orionweller/dolma_20bn_cc_high_quality
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: id dtype: string - name: text dtype: string - name: added dtype: string - name: created dtype: string - name: source dtype: string - name: original_shard_dir dtype: string - name: original_shard_idx dtype: int64 - name: num_tokens dtype: int64 splits: - name: shard_0 num_bytes: 10159054860 num_examples: 2023832 - name: shard_1 num_bytes: 10335510418 num_examples: 2694921 - name: shard_2 num_bytes: 10116720957 num_examples: 2707244 - name: shard_3 num_bytes: 10273891132 num_examples: 2819598 - name: shard_4 num_bytes: 10254072173 num_examples: 2933450 - name: shard_5 num_bytes: 10252684183 num_examples: 2811040 - name: shard_6 num_bytes: 10251251969 num_examples: 2854604 - name: shard_7 num_bytes: 10268677922 num_examples: 2812943 - name: shard_8 num_bytes: 10204523454 num_examples: 2825417 - name: shard_9 num_bytes: 10128825804 num_examples: 2039899 - name: shard_10 num_bytes: 10093223131 num_examples: 7575012 - name: shard_11 num_bytes: 10068649531 num_examples: 2445094 - name: shard_12 num_bytes: 6413670041 num_examples: 2427760 download_size: 71821332383 dataset_size: 128820755575 configs: - config_name: default data_files: - split: shard_0 path: data/shard_0-* - split: shard_1 path: data/shard_1-* - split: shard_2 path: data/shard_2-* - split: shard_3 path: data/shard_3-* - split: shard_4 path: data/shard_4-* - split: shard_5 path: data/shard_5-* - split: shard_6 path: data/shard_6-* - split: shard_7 path: data/shard_7-* - split: shard_8 path: data/shard_8-* - split: shard_9 path: data/shard_9-* - split: shard_10 path: data/shard_10-* - split: shard_11 path: data/shard_11-* - split: shard_12 path: data/shard_12-* ---
提供机构:
orionweller
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作