five

iamnguyen/mt_pubmed

收藏
Hugging Face2024-04-22 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/iamnguyen/mt_pubmed
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: set_1 features: - name: en dtype: string - name: vi dtype: string splits: - name: train num_bytes: 1464831369 num_examples: 1000000 download_size: 772226982 dataset_size: 1464831369 - config_name: set_10 features: - name: en dtype: string - name: vi dtype: string splits: - name: train num_bytes: 1528776855 num_examples: 1000000 download_size: 807470466 dataset_size: 1528776855 - config_name: set_11 features: - name: en dtype: string - name: vi dtype: string splits: - name: train num_bytes: 1498716030 num_examples: 1000000 download_size: 791675639 dataset_size: 1498716030 - config_name: set_12 features: - name: en dtype: string - name: vi dtype: string splits: - name: train num_bytes: 1478646211 num_examples: 1000000 download_size: 780165814 dataset_size: 1478646211 - config_name: set_13 features: - name: en dtype: string - name: vi dtype: string splits: - name: train num_bytes: 1488911510 num_examples: 1000000 download_size: 785584693 dataset_size: 1488911510 - config_name: set_14 features: - name: en dtype: string - name: vi dtype: string splits: - name: train num_bytes: 1503381374 num_examples: 1000000 download_size: 793973006 dataset_size: 1503381374 - config_name: set_15 features: - name: en dtype: string - name: vi dtype: string splits: - name: train num_bytes: 1475495040 num_examples: 1000000 download_size: 779631439 dataset_size: 1475495040 - config_name: set_16 features: - name: en dtype: string - name: vi dtype: string splits: - name: train num_bytes: 1482170185 num_examples: 1000000 download_size: 782721396 dataset_size: 1482170185 - config_name: set_17 features: - name: en dtype: string - name: vi dtype: string splits: - name: train num_bytes: 1464522405 num_examples: 1000000 download_size: 773956832 dataset_size: 1464522405 - config_name: set_18 features: - name: en dtype: string - name: vi dtype: string splits: - name: train num_bytes: 1477216021 num_examples: 1000000 download_size: 780825924 dataset_size: 1477216021 - config_name: set_19 features: - name: en dtype: string - name: vi dtype: string splits: - name: train num_bytes: 1490472782 num_examples: 1000000 download_size: 786630391 dataset_size: 1490472782 - config_name: set_2 features: - name: en dtype: string - name: vi dtype: string splits: - name: train num_bytes: 1455795103 num_examples: 1000000 download_size: 769340213 dataset_size: 1455795103 - config_name: set_20 features: - name: en dtype: string - name: vi dtype: string splits: - name: train num_bytes: 1502984580 num_examples: 1000000 download_size: 793281203 dataset_size: 1502984580 - config_name: set_21 features: - name: en dtype: string - name: vi dtype: string splits: - name: train num_bytes: 122577634 num_examples: 87006 download_size: 64514389 dataset_size: 122577634 - config_name: set_3 features: - name: en dtype: string - name: vi dtype: string splits: - name: train num_bytes: 1453130815 num_examples: 1000000 download_size: 767783648 dataset_size: 1453130815 - config_name: set_4 features: - name: en dtype: string - name: vi dtype: string splits: - name: train num_bytes: 1475168792 num_examples: 1000000 download_size: 779109544 dataset_size: 1475168792 - config_name: set_5 features: - name: en dtype: string - name: vi dtype: string splits: - name: train num_bytes: 1460964241 num_examples: 1000000 download_size: 770767654 dataset_size: 1460964241 - config_name: set_6 features: - name: en dtype: string - name: vi dtype: string splits: - name: train num_bytes: 1442665319 num_examples: 1000000 download_size: 761422204 dataset_size: 1442665319 - config_name: set_7 features: - name: en dtype: string - name: vi dtype: string splits: - name: train num_bytes: 1479179138 num_examples: 1000000 download_size: 780692266 dataset_size: 1479179138 - config_name: set_8 features: - name: en dtype: string - name: vi dtype: string splits: - name: train num_bytes: 1457200229 num_examples: 1000000 download_size: 769048651 dataset_size: 1457200229 - config_name: set_9 features: - name: en dtype: string - name: vi dtype: string splits: - name: train num_bytes: 1477252906 num_examples: 1000000 download_size: 780643074 dataset_size: 1477252906 configs: - config_name: set_1 data_files: - split: train path: set_1/train-* - config_name: set_10 data_files: - split: train path: set_10/train-* - config_name: set_11 data_files: - split: train path: set_11/train-* - config_name: set_12 data_files: - split: train path: set_12/train-* - config_name: set_13 data_files: - split: train path: set_13/train-* - config_name: set_14 data_files: - split: train path: set_14/train-* - config_name: set_15 data_files: - split: train path: set_15/train-* - config_name: set_16 data_files: - split: train path: set_16/train-* - config_name: set_17 data_files: - split: train path: set_17/train-* - config_name: set_18 data_files: - split: train path: set_18/train-* - config_name: set_19 data_files: - split: train path: set_19/train-* - config_name: set_2 data_files: - split: train path: set_2/train-* - config_name: set_20 data_files: - split: train path: set_20/train-* - config_name: set_21 data_files: - split: train path: set_21/train-* - config_name: set_3 data_files: - split: train path: set_3/train-* - config_name: set_4 data_files: - split: train path: set_4/train-* - config_name: set_5 data_files: - split: train path: set_5/train-* - config_name: set_6 data_files: - split: train path: set_6/train-* - config_name: set_7 data_files: - split: train path: set_7/train-* - config_name: set_8 data_files: - split: train path: set_8/train-* - config_name: set_9 data_files: - split: train path: set_9/train-* ---
提供机构:
iamnguyen
原始信息汇总

数据集概述

数据集配置

本数据集包含多个配置,每个配置对应不同的数据集子集,具体包括:

  • set_1set_21

数据集特征

每个配置的数据集特征相同,包括:

  • en: 数据类型为字符串
  • vi: 数据类型为字符串

数据集分割

每个配置的数据集均分为一个训练集(train),具体信息如下:

  • num_examples: 每个训练集包含1000000个样本,除了set_21包含87006个样本。
  • num_bytes: 每个训练集的大小以字节为单位,具体数值在不同配置中有所不同。
  • download_size: 下载大小,即数据集下载时的文件大小。
  • dataset_size: 数据集的实际大小。

数据集文件路径

每个配置的数据集文件路径格式统一,例如:

  • set_1: set_1/train-*
  • set_10: set_10/train-*
  • ...
  • set_21: set_21/train-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作