five

ohsuz/fineweb_10000

收藏
Hugging Face2024-06-09 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/ohsuz/fineweb_10000
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: range_1000 features: - name: eng dtype: string - name: ko dtype: string splits: - name: train num_bytes: 7528809 num_examples: 1000 download_size: 3782220 dataset_size: 7528809 - config_name: range_10000 features: - name: eng dtype: string - name: ko dtype: string splits: - name: train num_bytes: 9656763 num_examples: 1000 download_size: 4951550 dataset_size: 9656763 - config_name: range_2000 features: - name: eng dtype: string - name: ko dtype: string splits: - name: train num_bytes: 8528924 num_examples: 1000 download_size: 4283140 dataset_size: 8528924 - config_name: range_3000 features: - name: eng dtype: string - name: ko dtype: string splits: - name: train num_bytes: 8390947 num_examples: 1000 download_size: 4225446 dataset_size: 8390947 - config_name: range_4000 features: - name: eng dtype: string - name: ko dtype: string splits: - name: train num_bytes: 8621320 num_examples: 1000 download_size: 4342329 dataset_size: 8621320 - config_name: range_5000 features: - name: eng dtype: string - name: ko dtype: string splits: - name: train num_bytes: 8783296 num_examples: 1000 download_size: 4481703 dataset_size: 8783296 - config_name: range_6000 features: - name: eng dtype: string - name: ko dtype: string splits: - name: train num_bytes: 9167386 num_examples: 1000 download_size: 4675599 dataset_size: 9167386 - config_name: range_7000 features: - name: eng dtype: string - name: ko dtype: string splits: - name: train num_bytes: 8465596 num_examples: 1000 download_size: 4340244 dataset_size: 8465596 - config_name: range_8000 features: - name: eng dtype: string - name: ko dtype: string splits: - name: train num_bytes: 8945078 num_examples: 1000 download_size: 4579263 dataset_size: 8945078 - config_name: range_9000 features: - name: eng dtype: string - name: ko dtype: string splits: - name: train num_bytes: 8757161 num_examples: 1000 download_size: 4486561 dataset_size: 8757161 configs: - config_name: range_1000 data_files: - split: train path: range_1000/train-* - config_name: range_10000 data_files: - split: train path: range_10000/train-* - config_name: range_2000 data_files: - split: train path: range_2000/train-* - config_name: range_3000 data_files: - split: train path: range_3000/train-* - config_name: range_4000 data_files: - split: train path: range_4000/train-* - config_name: range_5000 data_files: - split: train path: range_5000/train-* - config_name: range_6000 data_files: - split: train path: range_6000/train-* - config_name: range_7000 data_files: - split: train path: range_7000/train-* - config_name: range_8000 data_files: - split: train path: range_8000/train-* - config_name: range_9000 data_files: - split: train path: range_9000/train-* ---
提供机构:
ohsuz
原始信息汇总

数据集概述

数据集配置

range_1000

  • 特征:
    • eng: string
    • ko: string
  • 分割:
    • train:
      • 字节数: 7528809
      • 样本数: 1000
  • 下载大小: 3782220
  • 数据集大小: 7528809

range_10000

  • 特征:
    • eng: string
    • ko: string
  • 分割:
    • train:
      • 字节数: 9656763
      • 样本数: 1000
  • 下载大小: 4951550
  • 数据集大小: 9656763

range_2000

  • 特征:
    • eng: string
    • ko: string
  • 分割:
    • train:
      • 字节数: 8528924
      • 样本数: 1000
  • 下载大小: 4283140
  • 数据集大小: 8528924

range_3000

  • 特征:
    • eng: string
    • ko: string
  • 分割:
    • train:
      • 字节数: 8390947
      • 样本数: 1000
  • 下载大小: 4225446
  • 数据集大小: 8390947

range_4000

  • 特征:
    • eng: string
    • ko: string
  • 分割:
    • train:
      • 字节数: 8621320
      • 样本数: 1000
  • 下载大小: 4342329
  • 数据集大小: 8621320

range_5000

  • 特征:
    • eng: string
    • ko: string
  • 分割:
    • train:
      • 字节数: 8783296
      • 样本数: 1000
  • 下载大小: 4481703
  • 数据集大小: 8783296

range_6000

  • 特征:
    • eng: string
    • ko: string
  • 分割:
    • train:
      • 字节数: 9167386
      • 样本数: 1000
  • 下载大小: 4675599
  • 数据集大小: 9167386

range_7000

  • 特征:
    • eng: string
    • ko: string
  • 分割:
    • train:
      • 字节数: 8465596
      • 样本数: 1000
  • 下载大小: 4340244
  • 数据集大小: 8465596

range_8000

  • 特征:
    • eng: string
    • ko: string
  • 分割:
    • train:
      • 字节数: 8945078
      • 样本数: 1000
  • 下载大小: 4579263
  • 数据集大小: 8945078

range_9000

  • 特征:
    • eng: string
    • ko: string
  • 分割:
    • train:
      • 字节数: 8757161
      • 样本数: 1000
  • 下载大小: 4486561
  • 数据集大小: 8757161
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作