cshjin/poseidon
收藏Hugging Face2024-06-11 更新2024-06-15 收录
下载链接:
https://hf-mirror.com/datasets/cshjin/poseidon
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: mit
size_categories:
- 10M<n<100M
task_categories:
- text-classification
dataset_info:
- config_name: 1000genome
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 4610059
num_examples: 33565
- name: validation
num_bytes: 658584
num_examples: 4795
- name: test
num_bytes: 1317341
num_examples: 9590
download_size: 1017503
dataset_size: 6585984
- config_name: 1000genome_v2
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 9054444
num_examples: 38469
- name: validation
num_bytes: 1131626
num_examples: 4809
- name: test
num_bytes: 1131813
num_examples: 4809
download_size: 2125750
dataset_size: 11317883
- config_name: casa_nowcast_full
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 85064826
num_examples: 613270
- name: validation
num_bytes: 12152060
num_examples: 87610
- name: test
num_bytes: 24303924
num_examples: 175221
download_size: 22669804
dataset_size: 121520810
- config_name: casa_wind_full
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 13960352
num_examples: 103194
- name: validation
num_bytes: 1993998
num_examples: 14741
- name: test
num_bytes: 3988544
num_examples: 29485
download_size: 2129250
dataset_size: 19942894
- config_name: eht_difmap
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 1569071
num_examples: 11573
- name: validation
num_bytes: 224133
num_examples: 1653
- name: test
num_bytes: 448293
num_examples: 3307
download_size: 204649
dataset_size: 2241497
- config_name: eht_imaging
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 568309
num_examples: 4208
- name: validation
num_bytes: 81173
num_examples: 601
- name: test
num_bytes: 162491
num_examples: 1203
download_size: 48737
dataset_size: 811973
- config_name: eht_smili
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 762885
num_examples: 5622
- name: validation
num_bytes: 109004
num_examples: 803
- name: test
num_bytes: 218119
num_examples: 1607
download_size: 80885
dataset_size: 1090008
- config_name: montage
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 16573939
num_examples: 120735
- name: validation
num_bytes: 2367854
num_examples: 17249
- name: test
num_bytes: 4734236
num_examples: 34496
download_size: 4063598
dataset_size: 23676029
- config_name: montage_v2
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 31727377
num_examples: 137984
- name: validation
num_bytes: 3966580
num_examples: 17248
- name: test
num_bytes: 3966361
num_examples: 17248
download_size: 6738078
dataset_size: 39660318
- config_name: predict_future_sales
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 8440649
num_examples: 62369
- name: validation
num_bytes: 1206088
num_examples: 8911
- name: test
num_bytes: 2411503
num_examples: 17820
download_size: 1294454
dataset_size: 12058240
- config_name: predict_future_sales_v2
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 16187818
num_examples: 71280
- name: validation
num_bytes: 2023245
num_examples: 8910
- name: test
num_bytes: 2023245
num_examples: 8910
download_size: 2711798
dataset_size: 20234308
- config_name: pycbc_inference
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 802964
num_examples: 5973
- name: validation
num_bytes: 114882
num_examples: 854
- name: test
num_bytes: 229474
num_examples: 1707
download_size: 57788
dataset_size: 1147320
- config_name: pycbc_search
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 17648101
num_examples: 130345
- name: validation
num_bytes: 2521321
num_examples: 18621
- name: test
num_bytes: 5042328
num_examples: 37242
download_size: 2896429
dataset_size: 25211750
- config_name: somospie
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 2867211
num_examples: 21210
- name: validation
num_bytes: 409350
num_examples: 3029
- name: test
num_bytes: 819246
num_examples: 6061
download_size: 380228
dataset_size: 4095807
- config_name: variant_calling
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 14020893
num_examples: 102581
- name: validation
num_bytes: 2003146
num_examples: 14654
- name: test
num_bytes: 4005924
num_examples: 29310
download_size: 2682052
dataset_size: 20029963
configs:
- config_name: 1000genome
data_files:
- split: train
path: 1000genome/train-*
- split: validation
path: 1000genome/validation-*
- split: test
path: 1000genome/test-*
- config_name: 1000genome_v2
data_files:
- split: train
path: 1000genome_v2/train-*
- split: validation
path: 1000genome_v2/validation-*
- split: test
path: 1000genome_v2/test-*
- config_name: casa_nowcast_full
data_files:
- split: train
path: casa_nowcast_full/train-*
- split: validation
path: casa_nowcast_full/validation-*
- split: test
path: casa_nowcast_full/test-*
- config_name: casa_wind_full
data_files:
- split: train
path: casa_wind_full/train-*
- split: validation
path: casa_wind_full/validation-*
- split: test
path: casa_wind_full/test-*
- config_name: eht_difmap
data_files:
- split: train
path: eht_difmap/train-*
- split: validation
path: eht_difmap/validation-*
- split: test
path: eht_difmap/test-*
- config_name: eht_imaging
data_files:
- split: train
path: eht_imaging/train-*
- split: validation
path: eht_imaging/validation-*
- split: test
path: eht_imaging/test-*
- config_name: eht_smili
data_files:
- split: train
path: eht_smili/train-*
- split: validation
path: eht_smili/validation-*
- split: test
path: eht_smili/test-*
- config_name: montage
data_files:
- split: train
path: montage/train-*
- split: validation
path: montage/validation-*
- split: test
path: montage/test-*
- config_name: montage_v2
data_files:
- split: train
path: montage_v2/train-*
- split: validation
path: montage_v2/validation-*
- split: test
path: montage_v2/test-*
- config_name: predict_future_sales
data_files:
- split: train
path: predict_future_sales/train-*
- split: validation
path: predict_future_sales/validation-*
- split: test
path: predict_future_sales/test-*
- config_name: predict_future_sales_v2
data_files:
- split: train
path: predict_future_sales_v2/train-*
- split: validation
path: predict_future_sales_v2/validation-*
- split: test
path: predict_future_sales_v2/test-*
- config_name: pycbc_inference
data_files:
- split: train
path: pycbc_inference/train-*
- split: validation
path: pycbc_inference/validation-*
- split: test
path: pycbc_inference/test-*
- config_name: pycbc_search
data_files:
- split: train
path: pycbc_search/train-*
- split: validation
path: pycbc_search/validation-*
- split: test
path: pycbc_search/test-*
- config_name: somospie
data_files:
- split: train
path: somospie/train-*
- split: validation
path: somospie/validation-*
- split: test
path: somospie/test-*
- config_name: variant_calling
data_files:
- split: train
path: variant_calling/train-*
- split: validation
path: variant_calling/validation-*
- split: test
path: variant_calling/test-*
---
提供机构:
cshjin
原始信息汇总
数据集概述
基本信息
- 语言: 英语
- 许可证: MIT
- 大小类别: 10M<n<100M
- 任务类别: 文本分类
数据集配置详情
1000genome
- 特征:
text: 字符串label: 64位整数
- 分割:
train: 4610059 字节, 33565 样本validation: 658584 字节, 4795 样本test: 1317341 字节, 9590 样本
- 下载大小: 1017503 字节
- 数据集大小: 6585984 字节
1000genome_v2
- 特征:
text: 字符串label: 64位整数
- 分割:
train: 9054444 字节, 38469 样本validation: 1131626 字节, 4809 样本test: 1131813 字节, 4809 样本
- 下载大小: 2125750 字节
- 数据集大小: 11317883 字节
casa_nowcast_full
- 特征:
text: 字符串label: 64位整数
- 分割:
train: 85064826 字节, 613270 样本validation: 12152060 字节, 87610 样本test: 24303924 字节, 175221 样本
- 下载大小: 22669804 字节
- 数据集大小: 121520810 字节
casa_wind_full
- 特征:
text: 字符串label: 64位整数
- 分割:
train: 13960352 字节, 103194 样本validation: 1993998 字节, 14741 样本test: 3988544 字节, 29485 样本
- 下载大小: 2129250 字节
- 数据集大小: 19942894 字节
eht_difmap
- 特征:
text: 字符串label: 64位整数
- 分割:
train: 1569071 字节, 11573 样本validation: 224133 字节, 1653 样本test: 448293 字节, 3307 样本
- 下载大小: 204649 字节
- 数据集大小: 2241497 字节
eht_imaging
- 特征:
text: 字符串label: 64位整数
- 分割:
train: 568309 字节, 4208 样本validation: 81173 字节, 601 样本test: 162491 字节, 1203 样本
- 下载大小: 48737 字节
- 数据集大小: 811973 字节
eht_smili
- 特征:
text: 字符串label: 64位整数
- 分割:
train: 762885 字节, 5622 样本validation: 109004 字节, 803 样本test: 218119 字节, 1607 样本
- 下载大小: 80885 字节
- 数据集大小: 1090008 字节
montage
- 特征:
text: 字符串label: 64位整数
- 分割:
train: 16573939 字节, 120735 样本validation: 2367854 字节, 17249 样本test: 4734236 字节, 34496 样本
- 下载大小: 4063598 字节
- 数据集大小: 23676029 字节
montage_v2
- 特征:
text: 字符串label: 64位整数
- 分割:
train: 31727377 字节, 137984 样本validation: 3966580 字节, 17248 样本test: 3966361 字节, 17248 样本
- 下载大小: 6738078 字节
- 数据集大小: 39660318 字节
predict_future_sales
- 特征:
text: 字符串label: 64位整数
- 分割:
train: 8440649 字节, 62369 样本validation: 1206088 字节, 8911 样本test: 2411503 字节, 17820 样本
- 下载大小: 1294454 字节
- 数据集大小: 12058240 字节
predict_future_sales_v2
- 特征:
text: 字符串label: 64位整数
- 分割:
train: 16187818 字节, 71280 样本validation: 2023245 字节, 8910 样本test: 2023245 字节, 8910 样本
- 下载大小: 2711798 字节
- 数据集大小: 20234308 字节
pycbc_inference
- 特征:
text: 字符串label: 64位整数
- 分割:
train: 802964 字节, 5973 样本validation: 114882 字节, 854 样本test: 229474 字节, 1707 样本
- 下载大小: 57788 字节
- 数据集大小: 1147320 字节
pycbc_search
- 特征:
text: 字符串label: 64位整数
- 分割:
train: 17648101 字节, 130345 样本validation: 2521321 字节, 18621 样本test: 5042328 字节, 37242 样本
- 下载大小: 2896429 字节
- 数据集大小: 25211750 字节
somospie
- 特征:
text: 字符串label: 64位整数
- 分割:
train: 2867211 字节, 21210 样本validation: 409350 字节, 3029 样本test: 819246 字节, 6061 样本
- 下载大小: 380228 字节
- 数据集大小: 4095807 字节
variant_calling
- 特征:
text: 字符串label: 64位整数
- 分割:
train: 14020893 字节, 102581 样本validation: 2003146 字节, 14654 样本test: 4005924 字节, 29310 样本
- 下载大小: 2682052 字节
- 数据集大小: 20029963 字节



