gayanin/pubmed-abstracts-dist-noised-v2
收藏Hugging Face2024-02-12 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/gayanin/pubmed-abstracts-dist-noised-v2
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: babylon-01
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 6595134
num_examples: 24908
- name: test
num_bytes: 816662
num_examples: 3113
- name: validation
num_bytes: 798507
num_examples: 3114
download_size: 4608640
dataset_size: 8210303
- config_name: babylon-02
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 6649302
num_examples: 24908
- name: test
num_bytes: 823956
num_examples: 3113
- name: validation
num_bytes: 804360
num_examples: 3114
download_size: 4709047
dataset_size: 8277618
- config_name: babylon-03
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 6704389
num_examples: 24908
- name: test
num_bytes: 830323
num_examples: 3113
- name: validation
num_bytes: 811675
num_examples: 3114
download_size: 4797674
dataset_size: 8346387
- config_name: gcd-01
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 6448182
num_examples: 24908
- name: test
num_bytes: 813378
num_examples: 3113
- name: validation
num_bytes: 802452
num_examples: 3114
download_size: 4503363
dataset_size: 8064012
- config_name: gcd-02
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 6431922
num_examples: 24908
- name: test
num_bytes: 810489
num_examples: 3113
- name: validation
num_bytes: 800488
num_examples: 3114
download_size: 4550524
dataset_size: 8042899
- config_name: gcd-03
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 6416391
num_examples: 24908
- name: test
num_bytes: 808759
num_examples: 3113
- name: validation
num_bytes: 797257
num_examples: 3114
download_size: 4584136
dataset_size: 8022407
- config_name: kaggle-01
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 5836542
num_examples: 24908
- name: test
num_bytes: 803285
num_examples: 3114
- name: validation
num_bytes: 801836
num_examples: 3114
download_size: 4176206
dataset_size: 7441663
- config_name: kaggle-02
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 5811500
num_examples: 24908
- name: test
num_bytes: 801272
num_examples: 3114
- name: validation
num_bytes: 798472
num_examples: 3114
download_size: 4210456
dataset_size: 7411244
- config_name: kaggle-03
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 5789912
num_examples: 24908
- name: test
num_bytes: 797824
num_examples: 3114
- name: validation
num_bytes: 796074
num_examples: 3114
download_size: 4237457
dataset_size: 7383810
- config_name: kaggle-04
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 5761003
num_examples: 24908
- name: test
num_bytes: 794947
num_examples: 3114
- name: validation
num_bytes: 792732
num_examples: 3114
download_size: 4253250
dataset_size: 7348682
configs:
- config_name: babylon-01
data_files:
- split: train
path: babylon-01/train-*
- split: test
path: babylon-01/test-*
- split: validation
path: babylon-01/validation-*
- config_name: babylon-02
data_files:
- split: train
path: babylon-02/train-*
- split: test
path: babylon-02/test-*
- split: validation
path: babylon-02/validation-*
- config_name: babylon-03
data_files:
- split: train
path: babylon-03/train-*
- split: test
path: babylon-03/test-*
- split: validation
path: babylon-03/validation-*
- config_name: gcd-01
data_files:
- split: train
path: gcd-01/train-*
- split: test
path: gcd-01/test-*
- split: validation
path: gcd-01/validation-*
- config_name: gcd-02
data_files:
- split: train
path: gcd-02/train-*
- split: test
path: gcd-02/test-*
- split: validation
path: gcd-02/validation-*
- config_name: gcd-03
data_files:
- split: train
path: gcd-03/train-*
- split: test
path: gcd-03/test-*
- split: validation
path: gcd-03/validation-*
- config_name: kaggle-01
data_files:
- split: train
path: kaggle-01/train-*
- split: test
path: kaggle-01/test-*
- split: validation
path: kaggle-01/validation-*
- config_name: kaggle-02
data_files:
- split: train
path: kaggle-02/train-*
- split: test
path: kaggle-02/test-*
- split: validation
path: kaggle-02/validation-*
- config_name: kaggle-03
data_files:
- split: train
path: kaggle-03/train-*
- split: test
path: kaggle-03/test-*
- split: validation
path: kaggle-03/validation-*
- config_name: kaggle-04
data_files:
- split: train
path: kaggle-04/train-*
- split: test
path: kaggle-04/test-*
- split: validation
path: kaggle-04/validation-*
---
提供机构:
gayanin
原始信息汇总
数据集概述
数据集配置
babylon-01
- 特征:
refs: stringtrans: string
- 分割:
train:- 字节数: 6595134
- 样本数: 24908
test:- 字节数: 816662
- 样本数: 3113
validation:- 字节数: 798507
- 样本数: 3114
- 下载大小: 4608640
- 数据集大小: 8210303
babylon-02
- 特征:
refs: stringtrans: string
- 分割:
train:- 字节数: 6649302
- 样本数: 24908
test:- 字节数: 823956
- 样本数: 3113
validation:- 字节数: 804360
- 样本数: 3114
- 下载大小: 4709047
- 数据集大小: 8277618
babylon-03
- 特征:
refs: stringtrans: string
- 分割:
train:- 字节数: 6704389
- 样本数: 24908
test:- 字节数: 830323
- 样本数: 3113
validation:- 字节数: 811675
- 样本数: 3114
- 下载大小: 4797674
- 数据集大小: 8346387
gcd-01
- 特征:
refs: stringtrans: string
- 分割:
train:- 字节数: 6448182
- 样本数: 24908
test:- 字节数: 813378
- 样本数: 3113
validation:- 字节数: 802452
- 样本数: 3114
- 下载大小: 4503363
- 数据集大小: 8064012
gcd-02
- 特征:
refs: stringtrans: string
- 分割:
train:- 字节数: 6431922
- 样本数: 24908
test:- 字节数: 810489
- 样本数: 3113
validation:- 字节数: 800488
- 样本数: 3114
- 下载大小: 4550524
- 数据集大小: 8042899
gcd-03
- 特征:
refs: stringtrans: string
- 分割:
train:- 字节数: 6416391
- 样本数: 24908
test:- 字节数: 808759
- 样本数: 3113
validation:- 字节数: 797257
- 样本数: 3114
- 下载大小: 4584136
- 数据集大小: 8022407
kaggle-01
- 特征:
refs: stringtrans: string
- 分割:
train:- 字节数: 5836542
- 样本数: 24908
test:- 字节数: 803285
- 样本数: 3114
validation:- 字节数: 801836
- 样本数: 3114
- 下载大小: 4176206
- 数据集大小: 7441663
kaggle-02
- 特征:
refs: stringtrans: string
- 分割:
train:- 字节数: 5811500
- 样本数: 24908
test:- 字节数: 801272
- 样本数: 3114
validation:- 字节数: 798472
- 样本数: 3114
- 下载大小: 4210456
- 数据集大小: 7411244
kaggle-03
- 特征:
refs: stringtrans: string
- 分割:
train:- 字节数: 5789912
- 样本数: 24908
test:- 字节数: 797824
- 样本数: 3114
validation:- 字节数: 796074
- 样本数: 3114
- 下载大小: 4237457
- 数据集大小: 7383810
kaggle-04
- 特征:
refs: stringtrans: string
- 分割:
train:- 字节数: 5761003
- 样本数: 24908
test:- 字节数: 794947
- 样本数: 3114
validation:- 字节数: 792732
- 样本数: 3114
- 下载大小: 4253250
- 数据集大小: 7348682



