iamnguyen/mt_pubmed
收藏Hugging Face2024-04-22 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/iamnguyen/mt_pubmed
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: set_1
features:
- name: en
dtype: string
- name: vi
dtype: string
splits:
- name: train
num_bytes: 1464831369
num_examples: 1000000
download_size: 772226982
dataset_size: 1464831369
- config_name: set_10
features:
- name: en
dtype: string
- name: vi
dtype: string
splits:
- name: train
num_bytes: 1528776855
num_examples: 1000000
download_size: 807470466
dataset_size: 1528776855
- config_name: set_11
features:
- name: en
dtype: string
- name: vi
dtype: string
splits:
- name: train
num_bytes: 1498716030
num_examples: 1000000
download_size: 791675639
dataset_size: 1498716030
- config_name: set_12
features:
- name: en
dtype: string
- name: vi
dtype: string
splits:
- name: train
num_bytes: 1478646211
num_examples: 1000000
download_size: 780165814
dataset_size: 1478646211
- config_name: set_13
features:
- name: en
dtype: string
- name: vi
dtype: string
splits:
- name: train
num_bytes: 1488911510
num_examples: 1000000
download_size: 785584693
dataset_size: 1488911510
- config_name: set_14
features:
- name: en
dtype: string
- name: vi
dtype: string
splits:
- name: train
num_bytes: 1503381374
num_examples: 1000000
download_size: 793973006
dataset_size: 1503381374
- config_name: set_15
features:
- name: en
dtype: string
- name: vi
dtype: string
splits:
- name: train
num_bytes: 1475495040
num_examples: 1000000
download_size: 779631439
dataset_size: 1475495040
- config_name: set_16
features:
- name: en
dtype: string
- name: vi
dtype: string
splits:
- name: train
num_bytes: 1482170185
num_examples: 1000000
download_size: 782721396
dataset_size: 1482170185
- config_name: set_17
features:
- name: en
dtype: string
- name: vi
dtype: string
splits:
- name: train
num_bytes: 1464522405
num_examples: 1000000
download_size: 773956832
dataset_size: 1464522405
- config_name: set_18
features:
- name: en
dtype: string
- name: vi
dtype: string
splits:
- name: train
num_bytes: 1477216021
num_examples: 1000000
download_size: 780825924
dataset_size: 1477216021
- config_name: set_19
features:
- name: en
dtype: string
- name: vi
dtype: string
splits:
- name: train
num_bytes: 1490472782
num_examples: 1000000
download_size: 786630391
dataset_size: 1490472782
- config_name: set_2
features:
- name: en
dtype: string
- name: vi
dtype: string
splits:
- name: train
num_bytes: 1455795103
num_examples: 1000000
download_size: 769340213
dataset_size: 1455795103
- config_name: set_20
features:
- name: en
dtype: string
- name: vi
dtype: string
splits:
- name: train
num_bytes: 1502984580
num_examples: 1000000
download_size: 793281203
dataset_size: 1502984580
- config_name: set_21
features:
- name: en
dtype: string
- name: vi
dtype: string
splits:
- name: train
num_bytes: 122577634
num_examples: 87006
download_size: 64514389
dataset_size: 122577634
- config_name: set_3
features:
- name: en
dtype: string
- name: vi
dtype: string
splits:
- name: train
num_bytes: 1453130815
num_examples: 1000000
download_size: 767783648
dataset_size: 1453130815
- config_name: set_4
features:
- name: en
dtype: string
- name: vi
dtype: string
splits:
- name: train
num_bytes: 1475168792
num_examples: 1000000
download_size: 779109544
dataset_size: 1475168792
- config_name: set_5
features:
- name: en
dtype: string
- name: vi
dtype: string
splits:
- name: train
num_bytes: 1460964241
num_examples: 1000000
download_size: 770767654
dataset_size: 1460964241
- config_name: set_6
features:
- name: en
dtype: string
- name: vi
dtype: string
splits:
- name: train
num_bytes: 1442665319
num_examples: 1000000
download_size: 761422204
dataset_size: 1442665319
- config_name: set_7
features:
- name: en
dtype: string
- name: vi
dtype: string
splits:
- name: train
num_bytes: 1479179138
num_examples: 1000000
download_size: 780692266
dataset_size: 1479179138
- config_name: set_8
features:
- name: en
dtype: string
- name: vi
dtype: string
splits:
- name: train
num_bytes: 1457200229
num_examples: 1000000
download_size: 769048651
dataset_size: 1457200229
- config_name: set_9
features:
- name: en
dtype: string
- name: vi
dtype: string
splits:
- name: train
num_bytes: 1477252906
num_examples: 1000000
download_size: 780643074
dataset_size: 1477252906
configs:
- config_name: set_1
data_files:
- split: train
path: set_1/train-*
- config_name: set_10
data_files:
- split: train
path: set_10/train-*
- config_name: set_11
data_files:
- split: train
path: set_11/train-*
- config_name: set_12
data_files:
- split: train
path: set_12/train-*
- config_name: set_13
data_files:
- split: train
path: set_13/train-*
- config_name: set_14
data_files:
- split: train
path: set_14/train-*
- config_name: set_15
data_files:
- split: train
path: set_15/train-*
- config_name: set_16
data_files:
- split: train
path: set_16/train-*
- config_name: set_17
data_files:
- split: train
path: set_17/train-*
- config_name: set_18
data_files:
- split: train
path: set_18/train-*
- config_name: set_19
data_files:
- split: train
path: set_19/train-*
- config_name: set_2
data_files:
- split: train
path: set_2/train-*
- config_name: set_20
data_files:
- split: train
path: set_20/train-*
- config_name: set_21
data_files:
- split: train
path: set_21/train-*
- config_name: set_3
data_files:
- split: train
path: set_3/train-*
- config_name: set_4
data_files:
- split: train
path: set_4/train-*
- config_name: set_5
data_files:
- split: train
path: set_5/train-*
- config_name: set_6
data_files:
- split: train
path: set_6/train-*
- config_name: set_7
data_files:
- split: train
path: set_7/train-*
- config_name: set_8
data_files:
- split: train
path: set_8/train-*
- config_name: set_9
data_files:
- split: train
path: set_9/train-*
---
提供机构:
iamnguyen
原始信息汇总
数据集概述
数据集配置
本数据集包含多个配置,每个配置对应不同的数据集子集,具体包括:
set_1至set_21
数据集特征
每个配置的数据集特征相同,包括:
en: 数据类型为字符串vi: 数据类型为字符串
数据集分割
每个配置的数据集均分为一个训练集(train),具体信息如下:
num_examples: 每个训练集包含1000000个样本,除了set_21包含87006个样本。num_bytes: 每个训练集的大小以字节为单位,具体数值在不同配置中有所不同。download_size: 下载大小,即数据集下载时的文件大小。dataset_size: 数据集的实际大小。
数据集文件路径
每个配置的数据集文件路径格式统一,例如:
set_1:set_1/train-*set_10:set_10/train-*- ...
set_21:set_21/train-*



