gayanin/pubmed-abstracts-noised-with-prob-dist-v2
收藏Hugging Face2024-02-09 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/gayanin/pubmed-abstracts-noised-with-prob-dist-v2
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: babylon-prob-01
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 6298703
num_examples: 24908
- name: test
num_bytes: 794582
num_examples: 3113
- name: validation
num_bytes: 784437
num_examples: 3114
download_size: 4438345
dataset_size: 7877722
- config_name: babylon-prob-02
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 6131860
num_examples: 24908
- name: test
num_bytes: 772976
num_examples: 3113
- name: validation
num_bytes: 763170
num_examples: 3114
download_size: 4431105
dataset_size: 7668006
- config_name: babylon-prob-03
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 5963382
num_examples: 24908
- name: test
num_bytes: 751530
num_examples: 3113
- name: validation
num_bytes: 743139
num_examples: 3114
download_size: 4411104
dataset_size: 7458051
- config_name: babylon-prob-04
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 5794478
num_examples: 24908
- name: test
num_bytes: 730929
num_examples: 3113
- name: validation
num_bytes: 720849
num_examples: 3114
download_size: 4374101
dataset_size: 7246256
- config_name: babylon-prob-05
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 5634718
num_examples: 24908
- name: test
num_bytes: 708651
num_examples: 3113
- name: validation
num_bytes: 701862
num_examples: 3114
download_size: 4336094
dataset_size: 7045231
- config_name: gcd-prob-01
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 5623412
num_examples: 24908
- name: test
num_bytes: 774353
num_examples: 3114
- name: validation
num_bytes: 772363
num_examples: 3114
download_size: 4026552
dataset_size: 7170128
- config_name: gcd-prob-02
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 5386733
num_examples: 24908
- name: test
num_bytes: 742236
num_examples: 3114
- name: validation
num_bytes: 739965
num_examples: 3114
download_size: 3926230
dataset_size: 6868934
- config_name: gcd-prob-03
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 5151749
num_examples: 24908
- name: test
num_bytes: 709209
num_examples: 3114
- name: validation
num_bytes: 706547
num_examples: 3114
download_size: 3806924
dataset_size: 6567505
- config_name: gcd-prob-04
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 4914469
num_examples: 24908
- name: test
num_bytes: 678027
num_examples: 3114
- name: validation
num_bytes: 676635
num_examples: 3114
download_size: 3674828
dataset_size: 6269131
- config_name: gcd-prob-05
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 4682536
num_examples: 24908
- name: test
num_bytes: 643943
num_examples: 3114
- name: validation
num_bytes: 644068
num_examples: 3114
download_size: 3536779
dataset_size: 5970547
- config_name: kaggle-prob-01
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 6254746
num_examples: 24908
- name: test
num_bytes: 787330
num_examples: 3113
- name: validation
num_bytes: 783533
num_examples: 3114
download_size: 4393817
dataset_size: 7825609
- config_name: kaggle-prob-02
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 6002616
num_examples: 24908
- name: test
num_bytes: 753845
num_examples: 3113
- name: validation
num_bytes: 751722
num_examples: 3114
download_size: 4291924
dataset_size: 7508183
- config_name: kaggle-prob-03
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 5747484
num_examples: 24908
- name: test
num_bytes: 722481
num_examples: 3113
- name: validation
num_bytes: 719629
num_examples: 3114
download_size: 4175521
dataset_size: 7189594
- config_name: kaggle-prob-04
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 5496897
num_examples: 24908
- name: test
num_bytes: 692009
num_examples: 3113
- name: validation
num_bytes: 688458
num_examples: 3114
download_size: 4054340
dataset_size: 6877364
- config_name: kaggle-prob-05
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 5243270
num_examples: 24908
- name: test
num_bytes: 658650
num_examples: 3113
- name: validation
num_bytes: 658178
num_examples: 3114
download_size: 3911586
dataset_size: 6560098
configs:
- config_name: babylon-prob-01
data_files:
- split: train
path: babylon-prob-01/train-*
- split: test
path: babylon-prob-01/test-*
- split: validation
path: babylon-prob-01/validation-*
- config_name: babylon-prob-02
data_files:
- split: train
path: babylon-prob-02/train-*
- split: test
path: babylon-prob-02/test-*
- split: validation
path: babylon-prob-02/validation-*
- config_name: babylon-prob-03
data_files:
- split: train
path: babylon-prob-03/train-*
- split: test
path: babylon-prob-03/test-*
- split: validation
path: babylon-prob-03/validation-*
- config_name: babylon-prob-04
data_files:
- split: train
path: babylon-prob-04/train-*
- split: test
path: babylon-prob-04/test-*
- split: validation
path: babylon-prob-04/validation-*
- config_name: babylon-prob-05
data_files:
- split: train
path: babylon-prob-05/train-*
- split: test
path: babylon-prob-05/test-*
- split: validation
path: babylon-prob-05/validation-*
- config_name: gcd-prob-01
data_files:
- split: train
path: gcd-prob-01/train-*
- split: test
path: gcd-prob-01/test-*
- split: validation
path: gcd-prob-01/validation-*
- config_name: gcd-prob-02
data_files:
- split: train
path: gcd-prob-02/train-*
- split: test
path: gcd-prob-02/test-*
- split: validation
path: gcd-prob-02/validation-*
- config_name: gcd-prob-03
data_files:
- split: train
path: gcd-prob-03/train-*
- split: test
path: gcd-prob-03/test-*
- split: validation
path: gcd-prob-03/validation-*
- config_name: gcd-prob-04
data_files:
- split: train
path: gcd-prob-04/train-*
- split: test
path: gcd-prob-04/test-*
- split: validation
path: gcd-prob-04/validation-*
- config_name: gcd-prob-05
data_files:
- split: train
path: gcd-prob-05/train-*
- split: test
path: gcd-prob-05/test-*
- split: validation
path: gcd-prob-05/validation-*
- config_name: kaggle-prob-01
data_files:
- split: train
path: kaggle-prob-01/train-*
- split: test
path: kaggle-prob-01/test-*
- split: validation
path: kaggle-prob-01/validation-*
- config_name: kaggle-prob-02
data_files:
- split: train
path: kaggle-prob-02/train-*
- split: test
path: kaggle-prob-02/test-*
- split: validation
path: kaggle-prob-02/validation-*
- config_name: kaggle-prob-03
data_files:
- split: train
path: kaggle-prob-03/train-*
- split: test
path: kaggle-prob-03/test-*
- split: validation
path: kaggle-prob-03/validation-*
- config_name: kaggle-prob-04
data_files:
- split: train
path: kaggle-prob-04/train-*
- split: test
path: kaggle-prob-04/test-*
- split: validation
path: kaggle-prob-04/validation-*
- config_name: kaggle-prob-05
data_files:
- split: train
path: kaggle-prob-05/train-*
- split: test
path: kaggle-prob-05/test-*
- split: validation
path: kaggle-prob-05/validation-*
---
提供机构:
gayanin
原始信息汇总
数据集概述
数据集配置
babylon-prob-01
- 特征:
refs: stringtrans: string
- 分割:
train: 6298703 字节, 24908 样本test: 794582 字节, 3113 样本validation: 784437 字节, 3114 样本
- 下载大小: 4438345 字节
- 数据集大小: 7877722 字节
babylon-prob-02
- 特征:
refs: stringtrans: string
- 分割:
train: 6131860 字节, 24908 样本test: 772976 字节, 3113 样本validation: 763170 字节, 3114 样本
- 下载大小: 4431105 字节
- 数据集大小: 7668006 字节
babylon-prob-03
- 特征:
refs: stringtrans: string
- 分割:
train: 5963382 字节, 24908 样本test: 751530 字节, 3113 样本validation: 743139 字节, 3114 样本
- 下载大小: 4411104 字节
- 数据集大小: 7458051 字节
babylon-prob-04
- 特征:
refs: stringtrans: string
- 分割:
train: 5794478 字节, 24908 样本test: 730929 字节, 3113 样本validation: 720849 字节, 3114 样本
- 下载大小: 4374101 字节
- 数据集大小: 7246256 字节
babylon-prob-05
- 特征:
refs: stringtrans: string
- 分割:
train: 5634718 字节, 24908 样本test: 708651 字节, 3113 样本validation: 701862 字节, 3114 样本
- 下载大小: 4336094 字节
- 数据集大小: 7045231 字节
gcd-prob-01
- 特征:
refs: stringtrans: string
- 分割:
train: 5623412 字节, 24908 样本test: 774353 字节, 3114 样本validation: 772363 字节, 3114 样本
- 下载大小: 4026552 字节
- 数据集大小: 7170128 字节
gcd-prob-02
- 特征:
refs: stringtrans: string
- 分割:
train: 5386733 字节, 24908 样本test: 742236 字节, 3114 样本validation: 739965 字节, 3114 样本
- 下载大小: 3926230 字节
- 数据集大小: 6868934 字节
gcd-prob-03
- 特征:
refs: stringtrans: string
- 分割:
train: 5151749 字节, 24908 样本test: 709209 字节, 3114 样本validation: 706547 字节, 3114 样本
- 下载大小: 3806924 字节
- 数据集大小: 6567505 字节
gcd-prob-04
- 特征:
refs: stringtrans: string
- 分割:
train: 4914469 字节, 24908 样本test: 678027 字节, 3114 样本validation: 676635 字节, 3114 样本
- 下载大小: 3674828 字节
- 数据集大小: 6269131 字节
gcd-prob-05
- 特征:
refs: stringtrans: string
- 分割:
train: 4682536 字节, 24908 样本test: 643943 字节, 3114 样本validation: 644068 字节, 3114 样本
- 下载大小: 3536779 字节
- 数据集大小: 5970547 字节
kaggle-prob-01
- 特征:
refs: stringtrans: string
- 分割:
train: 6254746 字节, 24908 样本test: 787330 字节, 3113 样本validation: 783533 字节, 3114 样本
- 下载大小: 4393817 字节
- 数据集大小: 7825609 字节
kaggle-prob-02
- 特征:
refs: stringtrans: string
- 分割:
train: 6002616 字节, 24908 样本test: 753845 字节, 3113 样本validation: 751722 字节, 3114 样本
- 下载大小: 4291924 字节
- 数据集大小: 7508183 字节
kaggle-prob-03
- 特征:
refs: stringtrans: string
- 分割:
train: 5747484 字节, 24908 样本test: 722481 字节, 3113 样本validation: 719629 字节, 3114 样本
- 下载大小: 4175521 字节
- 数据集大小: 7189594 字节
kaggle-prob-04
- 特征:
refs: stringtrans: string
- 分割:
train: 5496897 字节, 24908 样本test: 692009 字节, 3113 样本validation: 688458 字节, 3114 样本
- 下载大小: 4054340 字节
- 数据集大小: 6877364 字节
kaggle-prob-05
- 特征:
refs: stringtrans: string
- 分割:
train: 5243270 字节, 24908 样本test: 658650 字节, 3113 样本validation: 658178 字节, 3114 样本
- 下载大小: 3911586 字节
- 数据集大小: 6560098 字节



