five

WikiQuality/length_lo

收藏
Hugging Face2024-08-06 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/WikiQuality/length_lo
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: am features: - name: id dtype: string - name: url dtype: string - name: title dtype: string - name: text dtype: string splits: - name: train num_bytes: 19154347.178477906 num_examples: 10703 - name: test num_bytes: 2129652.7274959083 num_examples: 1190 download_size: 5763990 dataset_size: 21283999.905973814 - config_name: ary features: - name: id dtype: string - name: url dtype: string - name: title dtype: string - name: text dtype: string splits: - name: train num_bytes: 7480349.930011521 num_examples: 5067 - name: test num_bytes: 831149.9922235023 num_examples: 563 download_size: 2183784 dataset_size: 8311499.922235023 - config_name: ha features: - name: id dtype: string - name: url dtype: string - name: title dtype: string - name: text dtype: string splits: - name: train num_bytes: 58214622.185886554 num_examples: 27437 - name: test num_bytes: 6469234.356699643 num_examples: 3049 download_size: 22631675 dataset_size: 64683856.5425862 - config_name: ig features: - name: id dtype: string - name: url dtype: string - name: title dtype: string - name: text dtype: string splits: - name: train num_bytes: 47687272.71655538 num_examples: 16710 - name: test num_bytes: 5299537.129541791 num_examples: 1857 download_size: 17768068 dataset_size: 52986809.84609717 - config_name: om features: - name: id dtype: string - name: url dtype: string - name: title dtype: string - name: text dtype: string splits: - name: train num_bytes: 2841509.804195804 num_examples: 1589 - name: test num_bytes: 316518.0839160839 num_examples: 177 download_size: 1040038 dataset_size: 3158027.888111888 - config_name: pcm features: - name: id dtype: string - name: url dtype: string - name: title dtype: string - name: text dtype: string splits: - name: train num_bytes: 1283660.0853970966 num_examples: 856 - name: test num_bytes: 143961.87873612298 num_examples: 96 download_size: 584912 dataset_size: 1427621.9641332196 - config_name: rw features: - name: id dtype: string - name: url dtype: string - name: title dtype: string - name: text dtype: string splits: - name: train num_bytes: 8601014.624819625 num_examples: 5995 - name: test num_bytes: 956943.5787747606 num_examples: 667 download_size: 3515373 dataset_size: 9557958.203594387 - config_name: sw features: - name: id dtype: string - name: url dtype: string - name: title dtype: string - name: text dtype: string splits: - name: train num_bytes: 51434210.97419607 num_examples: 48281 - name: test num_bytes: 5715385.801382778 num_examples: 5365 download_size: 19623384 dataset_size: 57149596.77557885 - config_name: ti features: - name: id dtype: string - name: url dtype: string - name: title dtype: string - name: text dtype: string splits: - name: train num_bytes: 615180.1512195122 num_examples: 362 - name: test num_bytes: 69675.1 num_examples: 41 download_size: 201719 dataset_size: 684855.2512195122 - config_name: ts features: - name: id dtype: string - name: url dtype: string - name: title dtype: string - name: text dtype: string splits: - name: train num_bytes: 628676.3983628922 num_examples: 584 - name: test num_bytes: 69972.5443383356 num_examples: 65 download_size: 246039 dataset_size: 698648.9427012278 - config_name: tw features: - name: id dtype: string - name: url dtype: string - name: title dtype: string - name: text dtype: string splits: - name: train num_bytes: 5733890.1976158945 num_examples: 2942 - name: test num_bytes: 637315.4638410596 num_examples: 327 download_size: 2137634 dataset_size: 6371205.661456954 - config_name: yo features: - name: id dtype: string - name: url dtype: string - name: title dtype: string - name: text dtype: string splits: - name: train num_bytes: 10703058.614788147 num_examples: 12248 - name: test num_bytes: 1189325.830725561 num_examples: 1361 download_size: 4194959 dataset_size: 11892384.445513707 configs: - config_name: am data_files: - split: train path: am/train-* - split: test path: am/test-* - config_name: ary data_files: - split: train path: ary/train-* - split: test path: ary/test-* - config_name: ha data_files: - split: train path: ha/train-* - split: test path: ha/test-* - config_name: ig data_files: - split: train path: ig/train-* - split: test path: ig/test-* - config_name: om data_files: - split: train path: om/train-* - split: test path: om/test-* - config_name: pcm data_files: - split: train path: pcm/train-* - split: test path: pcm/test-* - config_name: rw data_files: - split: train path: rw/train-* - split: test path: rw/test-* - config_name: sw data_files: - split: train path: sw/train-* - split: test path: sw/test-* - config_name: ti data_files: - split: train path: ti/train-* - split: test path: ti/test-* - config_name: ts data_files: - split: train path: ts/train-* - split: test path: ts/test-* - config_name: tw data_files: - split: train path: tw/train-* - split: test path: tw/test-* - config_name: yo data_files: - split: train path: yo/train-* - split: test path: yo/test-* ---
提供机构:
WikiQuality
原始信息汇总

数据集概述

数据集配置

配置名称:ha

  • 特征
    • id: string
    • url: string
    • title: string
    • text: string
  • 分割
    • train:
      • 字节数: 52854921.08275882
      • 样本数: 25208
    • test:
      • 字节数: 2782389.7285314566
      • 样本数: 1327
  • 下载大小: 21407348
  • 数据集大小: 55637310.81129028

配置名称:ig

  • 特征
    • id: string
    • url: string
    • title: string
    • text: string
  • 分割
    • train:
      • 字节数: 46649881.97150997
      • 样本数: 16376
    • test:
      • 字节数: 2455556.8062678063
      • 样本数: 862
  • 下载大小: 16925305
  • 数据集大小: 49105438.777777776

配置名称:pcm

  • 特征
    • id: string
    • url: string
    • title: string
    • text: string
  • 分割
    • train:
      • 字节数: 1162202.3333333333
      • 样本数: 778
    • test:
      • 字节数: 61247.166666666664
      • 样本数: 41
  • 下载大小: 544475
  • 数据集大小: 1223449.5

配置名称:sw

  • 特征
    • id: string
    • url: string
    • title: string
    • text: string
  • 分割
    • train:
      • 字节数: 52324367.14514379
      • 样本数: 57469
    • test:
      • 字节数: 2754201.5802269042
      • 样本数: 3025
  • 下载大小: 18890934
  • 数据集大小: 55078568.7253707

配置名称:yo

  • 特征
    • id: string
    • url: string
    • title: string
    • text: string
  • 分割
    • train:
      • 字节数: 3909182.956299801
      • 样本数: 8295
    • test:
      • 字节数: 205944.90077191236
      • 样本数: 437
  • 下载大小: 3692970
  • 数据集大小: 4115127.857071713
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作