five

gayanin/pubmed-abstracts-dist-noised-v2

收藏
Hugging Face2024-02-12 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/gayanin/pubmed-abstracts-dist-noised-v2
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: babylon-01 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 6595134 num_examples: 24908 - name: test num_bytes: 816662 num_examples: 3113 - name: validation num_bytes: 798507 num_examples: 3114 download_size: 4608640 dataset_size: 8210303 - config_name: babylon-02 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 6649302 num_examples: 24908 - name: test num_bytes: 823956 num_examples: 3113 - name: validation num_bytes: 804360 num_examples: 3114 download_size: 4709047 dataset_size: 8277618 - config_name: babylon-03 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 6704389 num_examples: 24908 - name: test num_bytes: 830323 num_examples: 3113 - name: validation num_bytes: 811675 num_examples: 3114 download_size: 4797674 dataset_size: 8346387 - config_name: gcd-01 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 6448182 num_examples: 24908 - name: test num_bytes: 813378 num_examples: 3113 - name: validation num_bytes: 802452 num_examples: 3114 download_size: 4503363 dataset_size: 8064012 - config_name: gcd-02 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 6431922 num_examples: 24908 - name: test num_bytes: 810489 num_examples: 3113 - name: validation num_bytes: 800488 num_examples: 3114 download_size: 4550524 dataset_size: 8042899 - config_name: gcd-03 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 6416391 num_examples: 24908 - name: test num_bytes: 808759 num_examples: 3113 - name: validation num_bytes: 797257 num_examples: 3114 download_size: 4584136 dataset_size: 8022407 - config_name: kaggle-01 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 5836542 num_examples: 24908 - name: test num_bytes: 803285 num_examples: 3114 - name: validation num_bytes: 801836 num_examples: 3114 download_size: 4176206 dataset_size: 7441663 - config_name: kaggle-02 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 5811500 num_examples: 24908 - name: test num_bytes: 801272 num_examples: 3114 - name: validation num_bytes: 798472 num_examples: 3114 download_size: 4210456 dataset_size: 7411244 - config_name: kaggle-03 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 5789912 num_examples: 24908 - name: test num_bytes: 797824 num_examples: 3114 - name: validation num_bytes: 796074 num_examples: 3114 download_size: 4237457 dataset_size: 7383810 - config_name: kaggle-04 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 5761003 num_examples: 24908 - name: test num_bytes: 794947 num_examples: 3114 - name: validation num_bytes: 792732 num_examples: 3114 download_size: 4253250 dataset_size: 7348682 configs: - config_name: babylon-01 data_files: - split: train path: babylon-01/train-* - split: test path: babylon-01/test-* - split: validation path: babylon-01/validation-* - config_name: babylon-02 data_files: - split: train path: babylon-02/train-* - split: test path: babylon-02/test-* - split: validation path: babylon-02/validation-* - config_name: babylon-03 data_files: - split: train path: babylon-03/train-* - split: test path: babylon-03/test-* - split: validation path: babylon-03/validation-* - config_name: gcd-01 data_files: - split: train path: gcd-01/train-* - split: test path: gcd-01/test-* - split: validation path: gcd-01/validation-* - config_name: gcd-02 data_files: - split: train path: gcd-02/train-* - split: test path: gcd-02/test-* - split: validation path: gcd-02/validation-* - config_name: gcd-03 data_files: - split: train path: gcd-03/train-* - split: test path: gcd-03/test-* - split: validation path: gcd-03/validation-* - config_name: kaggle-01 data_files: - split: train path: kaggle-01/train-* - split: test path: kaggle-01/test-* - split: validation path: kaggle-01/validation-* - config_name: kaggle-02 data_files: - split: train path: kaggle-02/train-* - split: test path: kaggle-02/test-* - split: validation path: kaggle-02/validation-* - config_name: kaggle-03 data_files: - split: train path: kaggle-03/train-* - split: test path: kaggle-03/test-* - split: validation path: kaggle-03/validation-* - config_name: kaggle-04 data_files: - split: train path: kaggle-04/train-* - split: test path: kaggle-04/test-* - split: validation path: kaggle-04/validation-* ---
提供机构:
gayanin
原始信息汇总

数据集概述

数据集配置

babylon-01

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train:
      • 字节数: 6595134
      • 样本数: 24908
    • test:
      • 字节数: 816662
      • 样本数: 3113
    • validation:
      • 字节数: 798507
      • 样本数: 3114
  • 下载大小: 4608640
  • 数据集大小: 8210303

babylon-02

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train:
      • 字节数: 6649302
      • 样本数: 24908
    • test:
      • 字节数: 823956
      • 样本数: 3113
    • validation:
      • 字节数: 804360
      • 样本数: 3114
  • 下载大小: 4709047
  • 数据集大小: 8277618

babylon-03

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train:
      • 字节数: 6704389
      • 样本数: 24908
    • test:
      • 字节数: 830323
      • 样本数: 3113
    • validation:
      • 字节数: 811675
      • 样本数: 3114
  • 下载大小: 4797674
  • 数据集大小: 8346387

gcd-01

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train:
      • 字节数: 6448182
      • 样本数: 24908
    • test:
      • 字节数: 813378
      • 样本数: 3113
    • validation:
      • 字节数: 802452
      • 样本数: 3114
  • 下载大小: 4503363
  • 数据集大小: 8064012

gcd-02

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train:
      • 字节数: 6431922
      • 样本数: 24908
    • test:
      • 字节数: 810489
      • 样本数: 3113
    • validation:
      • 字节数: 800488
      • 样本数: 3114
  • 下载大小: 4550524
  • 数据集大小: 8042899

gcd-03

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train:
      • 字节数: 6416391
      • 样本数: 24908
    • test:
      • 字节数: 808759
      • 样本数: 3113
    • validation:
      • 字节数: 797257
      • 样本数: 3114
  • 下载大小: 4584136
  • 数据集大小: 8022407

kaggle-01

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train:
      • 字节数: 5836542
      • 样本数: 24908
    • test:
      • 字节数: 803285
      • 样本数: 3114
    • validation:
      • 字节数: 801836
      • 样本数: 3114
  • 下载大小: 4176206
  • 数据集大小: 7441663

kaggle-02

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train:
      • 字节数: 5811500
      • 样本数: 24908
    • test:
      • 字节数: 801272
      • 样本数: 3114
    • validation:
      • 字节数: 798472
      • 样本数: 3114
  • 下载大小: 4210456
  • 数据集大小: 7411244

kaggle-03

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train:
      • 字节数: 5789912
      • 样本数: 24908
    • test:
      • 字节数: 797824
      • 样本数: 3114
    • validation:
      • 字节数: 796074
      • 样本数: 3114
  • 下载大小: 4237457
  • 数据集大小: 7383810

kaggle-04

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train:
      • 字节数: 5761003
      • 样本数: 24908
    • test:
      • 字节数: 794947
      • 样本数: 3114
    • validation:
      • 字节数: 792732
      • 样本数: 3114
  • 下载大小: 4253250
  • 数据集大小: 7348682
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作