five

cshjin/poseidon

收藏
Hugging Face2024-06-11 更新2024-06-15 收录
下载链接:
https://hf-mirror.com/datasets/cshjin/poseidon
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: mit size_categories: - 10M<n<100M task_categories: - text-classification dataset_info: - config_name: 1000genome features: - name: text dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 4610059 num_examples: 33565 - name: validation num_bytes: 658584 num_examples: 4795 - name: test num_bytes: 1317341 num_examples: 9590 download_size: 1017503 dataset_size: 6585984 - config_name: 1000genome_v2 features: - name: text dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 9054444 num_examples: 38469 - name: validation num_bytes: 1131626 num_examples: 4809 - name: test num_bytes: 1131813 num_examples: 4809 download_size: 2125750 dataset_size: 11317883 - config_name: casa_nowcast_full features: - name: text dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 85064826 num_examples: 613270 - name: validation num_bytes: 12152060 num_examples: 87610 - name: test num_bytes: 24303924 num_examples: 175221 download_size: 22669804 dataset_size: 121520810 - config_name: casa_wind_full features: - name: text dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 13960352 num_examples: 103194 - name: validation num_bytes: 1993998 num_examples: 14741 - name: test num_bytes: 3988544 num_examples: 29485 download_size: 2129250 dataset_size: 19942894 - config_name: eht_difmap features: - name: text dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 1569071 num_examples: 11573 - name: validation num_bytes: 224133 num_examples: 1653 - name: test num_bytes: 448293 num_examples: 3307 download_size: 204649 dataset_size: 2241497 - config_name: eht_imaging features: - name: text dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 568309 num_examples: 4208 - name: validation num_bytes: 81173 num_examples: 601 - name: test num_bytes: 162491 num_examples: 1203 download_size: 48737 dataset_size: 811973 - config_name: eht_smili features: - name: text dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 762885 num_examples: 5622 - name: validation num_bytes: 109004 num_examples: 803 - name: test num_bytes: 218119 num_examples: 1607 download_size: 80885 dataset_size: 1090008 - config_name: montage features: - name: text dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 16573939 num_examples: 120735 - name: validation num_bytes: 2367854 num_examples: 17249 - name: test num_bytes: 4734236 num_examples: 34496 download_size: 4063598 dataset_size: 23676029 - config_name: montage_v2 features: - name: text dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 31727377 num_examples: 137984 - name: validation num_bytes: 3966580 num_examples: 17248 - name: test num_bytes: 3966361 num_examples: 17248 download_size: 6738078 dataset_size: 39660318 - config_name: predict_future_sales features: - name: text dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 8440649 num_examples: 62369 - name: validation num_bytes: 1206088 num_examples: 8911 - name: test num_bytes: 2411503 num_examples: 17820 download_size: 1294454 dataset_size: 12058240 - config_name: predict_future_sales_v2 features: - name: text dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 16187818 num_examples: 71280 - name: validation num_bytes: 2023245 num_examples: 8910 - name: test num_bytes: 2023245 num_examples: 8910 download_size: 2711798 dataset_size: 20234308 - config_name: pycbc_inference features: - name: text dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 802964 num_examples: 5973 - name: validation num_bytes: 114882 num_examples: 854 - name: test num_bytes: 229474 num_examples: 1707 download_size: 57788 dataset_size: 1147320 - config_name: pycbc_search features: - name: text dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 17648101 num_examples: 130345 - name: validation num_bytes: 2521321 num_examples: 18621 - name: test num_bytes: 5042328 num_examples: 37242 download_size: 2896429 dataset_size: 25211750 - config_name: somospie features: - name: text dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 2867211 num_examples: 21210 - name: validation num_bytes: 409350 num_examples: 3029 - name: test num_bytes: 819246 num_examples: 6061 download_size: 380228 dataset_size: 4095807 - config_name: variant_calling features: - name: text dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 14020893 num_examples: 102581 - name: validation num_bytes: 2003146 num_examples: 14654 - name: test num_bytes: 4005924 num_examples: 29310 download_size: 2682052 dataset_size: 20029963 configs: - config_name: 1000genome data_files: - split: train path: 1000genome/train-* - split: validation path: 1000genome/validation-* - split: test path: 1000genome/test-* - config_name: 1000genome_v2 data_files: - split: train path: 1000genome_v2/train-* - split: validation path: 1000genome_v2/validation-* - split: test path: 1000genome_v2/test-* - config_name: casa_nowcast_full data_files: - split: train path: casa_nowcast_full/train-* - split: validation path: casa_nowcast_full/validation-* - split: test path: casa_nowcast_full/test-* - config_name: casa_wind_full data_files: - split: train path: casa_wind_full/train-* - split: validation path: casa_wind_full/validation-* - split: test path: casa_wind_full/test-* - config_name: eht_difmap data_files: - split: train path: eht_difmap/train-* - split: validation path: eht_difmap/validation-* - split: test path: eht_difmap/test-* - config_name: eht_imaging data_files: - split: train path: eht_imaging/train-* - split: validation path: eht_imaging/validation-* - split: test path: eht_imaging/test-* - config_name: eht_smili data_files: - split: train path: eht_smili/train-* - split: validation path: eht_smili/validation-* - split: test path: eht_smili/test-* - config_name: montage data_files: - split: train path: montage/train-* - split: validation path: montage/validation-* - split: test path: montage/test-* - config_name: montage_v2 data_files: - split: train path: montage_v2/train-* - split: validation path: montage_v2/validation-* - split: test path: montage_v2/test-* - config_name: predict_future_sales data_files: - split: train path: predict_future_sales/train-* - split: validation path: predict_future_sales/validation-* - split: test path: predict_future_sales/test-* - config_name: predict_future_sales_v2 data_files: - split: train path: predict_future_sales_v2/train-* - split: validation path: predict_future_sales_v2/validation-* - split: test path: predict_future_sales_v2/test-* - config_name: pycbc_inference data_files: - split: train path: pycbc_inference/train-* - split: validation path: pycbc_inference/validation-* - split: test path: pycbc_inference/test-* - config_name: pycbc_search data_files: - split: train path: pycbc_search/train-* - split: validation path: pycbc_search/validation-* - split: test path: pycbc_search/test-* - config_name: somospie data_files: - split: train path: somospie/train-* - split: validation path: somospie/validation-* - split: test path: somospie/test-* - config_name: variant_calling data_files: - split: train path: variant_calling/train-* - split: validation path: variant_calling/validation-* - split: test path: variant_calling/test-* ---
提供机构:
cshjin
原始信息汇总

数据集概述

基本信息

  • 语言: 英语
  • 许可证: MIT
  • 大小类别: 10M<n<100M
  • 任务类别: 文本分类

数据集配置详情

1000genome

  • 特征:
    • text: 字符串
    • label: 64位整数
  • 分割:
    • train: 4610059 字节, 33565 样本
    • validation: 658584 字节, 4795 样本
    • test: 1317341 字节, 9590 样本
  • 下载大小: 1017503 字节
  • 数据集大小: 6585984 字节

1000genome_v2

  • 特征:
    • text: 字符串
    • label: 64位整数
  • 分割:
    • train: 9054444 字节, 38469 样本
    • validation: 1131626 字节, 4809 样本
    • test: 1131813 字节, 4809 样本
  • 下载大小: 2125750 字节
  • 数据集大小: 11317883 字节

casa_nowcast_full

  • 特征:
    • text: 字符串
    • label: 64位整数
  • 分割:
    • train: 85064826 字节, 613270 样本
    • validation: 12152060 字节, 87610 样本
    • test: 24303924 字节, 175221 样本
  • 下载大小: 22669804 字节
  • 数据集大小: 121520810 字节

casa_wind_full

  • 特征:
    • text: 字符串
    • label: 64位整数
  • 分割:
    • train: 13960352 字节, 103194 样本
    • validation: 1993998 字节, 14741 样本
    • test: 3988544 字节, 29485 样本
  • 下载大小: 2129250 字节
  • 数据集大小: 19942894 字节

eht_difmap

  • 特征:
    • text: 字符串
    • label: 64位整数
  • 分割:
    • train: 1569071 字节, 11573 样本
    • validation: 224133 字节, 1653 样本
    • test: 448293 字节, 3307 样本
  • 下载大小: 204649 字节
  • 数据集大小: 2241497 字节

eht_imaging

  • 特征:
    • text: 字符串
    • label: 64位整数
  • 分割:
    • train: 568309 字节, 4208 样本
    • validation: 81173 字节, 601 样本
    • test: 162491 字节, 1203 样本
  • 下载大小: 48737 字节
  • 数据集大小: 811973 字节

eht_smili

  • 特征:
    • text: 字符串
    • label: 64位整数
  • 分割:
    • train: 762885 字节, 5622 样本
    • validation: 109004 字节, 803 样本
    • test: 218119 字节, 1607 样本
  • 下载大小: 80885 字节
  • 数据集大小: 1090008 字节

montage

  • 特征:
    • text: 字符串
    • label: 64位整数
  • 分割:
    • train: 16573939 字节, 120735 样本
    • validation: 2367854 字节, 17249 样本
    • test: 4734236 字节, 34496 样本
  • 下载大小: 4063598 字节
  • 数据集大小: 23676029 字节

montage_v2

  • 特征:
    • text: 字符串
    • label: 64位整数
  • 分割:
    • train: 31727377 字节, 137984 样本
    • validation: 3966580 字节, 17248 样本
    • test: 3966361 字节, 17248 样本
  • 下载大小: 6738078 字节
  • 数据集大小: 39660318 字节

predict_future_sales

  • 特征:
    • text: 字符串
    • label: 64位整数
  • 分割:
    • train: 8440649 字节, 62369 样本
    • validation: 1206088 字节, 8911 样本
    • test: 2411503 字节, 17820 样本
  • 下载大小: 1294454 字节
  • 数据集大小: 12058240 字节

predict_future_sales_v2

  • 特征:
    • text: 字符串
    • label: 64位整数
  • 分割:
    • train: 16187818 字节, 71280 样本
    • validation: 2023245 字节, 8910 样本
    • test: 2023245 字节, 8910 样本
  • 下载大小: 2711798 字节
  • 数据集大小: 20234308 字节

pycbc_inference

  • 特征:
    • text: 字符串
    • label: 64位整数
  • 分割:
    • train: 802964 字节, 5973 样本
    • validation: 114882 字节, 854 样本
    • test: 229474 字节, 1707 样本
  • 下载大小: 57788 字节
  • 数据集大小: 1147320 字节

pycbc_search

  • 特征:
    • text: 字符串
    • label: 64位整数
  • 分割:
    • train: 17648101 字节, 130345 样本
    • validation: 2521321 字节, 18621 样本
    • test: 5042328 字节, 37242 样本
  • 下载大小: 2896429 字节
  • 数据集大小: 25211750 字节

somospie

  • 特征:
    • text: 字符串
    • label: 64位整数
  • 分割:
    • train: 2867211 字节, 21210 样本
    • validation: 409350 字节, 3029 样本
    • test: 819246 字节, 6061 样本
  • 下载大小: 380228 字节
  • 数据集大小: 4095807 字节

variant_calling

  • 特征:
    • text: 字符串
    • label: 64位整数
  • 分割:
    • train: 14020893 字节, 102581 样本
    • validation: 2003146 字节, 14654 样本
    • test: 4005924 字节, 29310 样本
  • 下载大小: 2682052 字节
  • 数据集大小: 20029963 字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作