five

gayanin/pubmed-abstracts-dist-noised

收藏
Hugging Face2024-02-12 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/gayanin/pubmed-abstracts-dist-noised
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: babylon-01 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 18966629 num_examples: 74724 - name: test num_bytes: 2498780 num_examples: 9341 - name: validation num_bytes: 2430470 num_examples: 9341 download_size: 13371241 dataset_size: 23895879 - config_name: babylon-02 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 19119149 num_examples: 74724 - name: test num_bytes: 2518943 num_examples: 9341 - name: validation num_bytes: 2450189 num_examples: 9341 download_size: 13665855 dataset_size: 24088281 - config_name: babylon-03 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 19273175 num_examples: 74724 - name: test num_bytes: 2539404 num_examples: 9341 - name: validation num_bytes: 2470170 num_examples: 9341 download_size: 13917268 dataset_size: 24282749 - config_name: gcd-01 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 18762932 num_examples: 74724 - name: test num_bytes: 2470354 num_examples: 9341 - name: validation num_bytes: 2404075 num_examples: 9341 download_size: 13219782 dataset_size: 23637361 - config_name: gcd-02 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 18711610 num_examples: 74724 - name: test num_bytes: 2464019 num_examples: 9341 - name: validation num_bytes: 2397279 num_examples: 9341 download_size: 13357450 dataset_size: 23572908 - config_name: gcd-03 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 18656483 num_examples: 74724 - name: test num_bytes: 2458101 num_examples: 9341 - name: validation num_bytes: 2391598 num_examples: 9341 download_size: 13450620 dataset_size: 23506182 - config_name: gcd-04 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 18607987 num_examples: 74724 - name: test num_bytes: 2452163 num_examples: 9341 - name: validation num_bytes: 2382726 num_examples: 9341 download_size: 13518201 dataset_size: 23442876 - config_name: kaggle-01 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 18741304 num_examples: 74724 - name: test num_bytes: 2468049 num_examples: 9341 - name: validation num_bytes: 2401399 num_examples: 9341 download_size: 13191893 dataset_size: 23610752 - config_name: kaggle-02 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 18668842 num_examples: 74724 - name: test num_bytes: 2458530 num_examples: 9341 - name: validation num_bytes: 2391012 num_examples: 9341 download_size: 13313844 dataset_size: 23518384 - config_name: kaggle-03 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 18598440 num_examples: 74724 - name: test num_bytes: 2449161 num_examples: 9341 - name: validation num_bytes: 2382943 num_examples: 9341 download_size: 13399488 dataset_size: 23430544 - config_name: kaggle-04 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 18520899 num_examples: 74724 - name: test num_bytes: 2443154 num_examples: 9341 - name: validation num_bytes: 2372869 num_examples: 9341 download_size: 13447691 dataset_size: 23336922 configs: - config_name: babylon-01 data_files: - split: train path: babylon-01/train-* - split: test path: babylon-01/test-* - split: validation path: babylon-01/validation-* - config_name: babylon-02 data_files: - split: train path: babylon-02/train-* - split: test path: babylon-02/test-* - split: validation path: babylon-02/validation-* - config_name: babylon-03 data_files: - split: train path: babylon-03/train-* - split: test path: babylon-03/test-* - split: validation path: babylon-03/validation-* - config_name: gcd-01 data_files: - split: train path: gcd-01/train-* - split: test path: gcd-01/test-* - split: validation path: gcd-01/validation-* - config_name: gcd-02 data_files: - split: train path: gcd-02/train-* - split: test path: gcd-02/test-* - split: validation path: gcd-02/validation-* - config_name: gcd-03 data_files: - split: train path: gcd-03/train-* - split: test path: gcd-03/test-* - split: validation path: gcd-03/validation-* - config_name: gcd-04 data_files: - split: train path: gcd-04/train-* - split: test path: gcd-04/test-* - split: validation path: gcd-04/validation-* - config_name: kaggle-01 data_files: - split: train path: kaggle-01/train-* - split: test path: kaggle-01/test-* - split: validation path: kaggle-01/validation-* - config_name: kaggle-02 data_files: - split: train path: kaggle-02/train-* - split: test path: kaggle-02/test-* - split: validation path: kaggle-02/validation-* - config_name: kaggle-03 data_files: - split: train path: kaggle-03/train-* - split: test path: kaggle-03/test-* - split: validation path: kaggle-03/validation-* - config_name: kaggle-04 data_files: - split: train path: kaggle-04/train-* - split: test path: kaggle-04/test-* - split: validation path: kaggle-04/validation-* ---
提供机构:
gayanin
原始信息汇总

数据集概述

数据集配置

babylon-01

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 18966629 字节, 74724 样本
    • test: 2498780 字节, 9341 样本
    • validation: 2430470 字节, 9341 样本
  • 下载大小: 13371241 字节
  • 数据集大小: 23895879 字节

babylon-02

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 19119149 字节, 74724 样本
    • test: 2518943 字节, 9341 样本
    • validation: 2450189 字节, 9341 样本
  • 下载大小: 13665855 字节
  • 数据集大小: 24088281 字节

babylon-03

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 19273175 字节, 74724 样本
    • test: 2539404 字节, 9341 样本
    • validation: 2470170 字节, 9341 样本
  • 下载大小: 13917268 字节
  • 数据集大小: 24282749 字节

gcd-01

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 18762932 字节, 74724 样本
    • test: 2470354 字节, 9341 样本
    • validation: 2404075 字节, 9341 样本
  • 下载大小: 13219782 字节
  • 数据集大小: 23637361 字节

gcd-02

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 18711610 字节, 74724 样本
    • test: 2464019 字节, 9341 样本
    • validation: 2397279 字节, 9341 样本
  • 下载大小: 13357450 字节
  • 数据集大小: 23572908 字节

gcd-03

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 18656483 字节, 74724 样本
    • test: 2458101 字节, 9341 样本
    • validation: 2391598 字节, 9341 样本
  • 下载大小: 13450620 字节
  • 数据集大小: 23506182 字节

gcd-04

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 18607987 字节, 74724 样本
    • test: 2452163 字节, 9341 样本
    • validation: 2382726 字节, 9341 样本
  • 下载大小: 13518201 字节
  • 数据集大小: 23442876 字节

kaggle-01

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 18741304 字节, 74724 样本
    • test: 2468049 字节, 9341 样本
    • validation: 2401399 字节, 9341 样本
  • 下载大小: 13191893 字节
  • 数据集大小: 23610752 字节

kaggle-02

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 18668842 字节, 74724 样本
    • test: 2458530 字节, 9341 样本
    • validation: 2391012 字节, 9341 样本
  • 下载大小: 13313844 字节
  • 数据集大小: 23518384 字节

kaggle-03

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 18598440 字节, 74724 样本
    • test: 2449161 字节, 9341 样本
    • validation: 2382943 字节, 9341 样本
  • 下载大小: 13399488 字节
  • 数据集大小: 23430544 字节

kaggle-04

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 18520899 字节, 74724 样本
    • test: 2443154 字节, 9341 样本
    • validation: 2372869 字节, 9341 样本
  • 下载大小: 13447691 字节
  • 数据集大小: 23336922 字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作