five

gayanin/pubmed-abstracts-noised-with-prob-dist-v2

收藏
Hugging Face2024-02-09 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/gayanin/pubmed-abstracts-noised-with-prob-dist-v2
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: babylon-prob-01 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 6298703 num_examples: 24908 - name: test num_bytes: 794582 num_examples: 3113 - name: validation num_bytes: 784437 num_examples: 3114 download_size: 4438345 dataset_size: 7877722 - config_name: babylon-prob-02 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 6131860 num_examples: 24908 - name: test num_bytes: 772976 num_examples: 3113 - name: validation num_bytes: 763170 num_examples: 3114 download_size: 4431105 dataset_size: 7668006 - config_name: babylon-prob-03 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 5963382 num_examples: 24908 - name: test num_bytes: 751530 num_examples: 3113 - name: validation num_bytes: 743139 num_examples: 3114 download_size: 4411104 dataset_size: 7458051 - config_name: babylon-prob-04 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 5794478 num_examples: 24908 - name: test num_bytes: 730929 num_examples: 3113 - name: validation num_bytes: 720849 num_examples: 3114 download_size: 4374101 dataset_size: 7246256 - config_name: babylon-prob-05 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 5634718 num_examples: 24908 - name: test num_bytes: 708651 num_examples: 3113 - name: validation num_bytes: 701862 num_examples: 3114 download_size: 4336094 dataset_size: 7045231 - config_name: gcd-prob-01 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 5623412 num_examples: 24908 - name: test num_bytes: 774353 num_examples: 3114 - name: validation num_bytes: 772363 num_examples: 3114 download_size: 4026552 dataset_size: 7170128 - config_name: gcd-prob-02 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 5386733 num_examples: 24908 - name: test num_bytes: 742236 num_examples: 3114 - name: validation num_bytes: 739965 num_examples: 3114 download_size: 3926230 dataset_size: 6868934 - config_name: gcd-prob-03 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 5151749 num_examples: 24908 - name: test num_bytes: 709209 num_examples: 3114 - name: validation num_bytes: 706547 num_examples: 3114 download_size: 3806924 dataset_size: 6567505 - config_name: gcd-prob-04 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 4914469 num_examples: 24908 - name: test num_bytes: 678027 num_examples: 3114 - name: validation num_bytes: 676635 num_examples: 3114 download_size: 3674828 dataset_size: 6269131 - config_name: gcd-prob-05 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 4682536 num_examples: 24908 - name: test num_bytes: 643943 num_examples: 3114 - name: validation num_bytes: 644068 num_examples: 3114 download_size: 3536779 dataset_size: 5970547 - config_name: kaggle-prob-01 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 6254746 num_examples: 24908 - name: test num_bytes: 787330 num_examples: 3113 - name: validation num_bytes: 783533 num_examples: 3114 download_size: 4393817 dataset_size: 7825609 - config_name: kaggle-prob-02 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 6002616 num_examples: 24908 - name: test num_bytes: 753845 num_examples: 3113 - name: validation num_bytes: 751722 num_examples: 3114 download_size: 4291924 dataset_size: 7508183 - config_name: kaggle-prob-03 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 5747484 num_examples: 24908 - name: test num_bytes: 722481 num_examples: 3113 - name: validation num_bytes: 719629 num_examples: 3114 download_size: 4175521 dataset_size: 7189594 - config_name: kaggle-prob-04 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 5496897 num_examples: 24908 - name: test num_bytes: 692009 num_examples: 3113 - name: validation num_bytes: 688458 num_examples: 3114 download_size: 4054340 dataset_size: 6877364 - config_name: kaggle-prob-05 features: - name: refs dtype: string - name: trans dtype: string splits: - name: train num_bytes: 5243270 num_examples: 24908 - name: test num_bytes: 658650 num_examples: 3113 - name: validation num_bytes: 658178 num_examples: 3114 download_size: 3911586 dataset_size: 6560098 configs: - config_name: babylon-prob-01 data_files: - split: train path: babylon-prob-01/train-* - split: test path: babylon-prob-01/test-* - split: validation path: babylon-prob-01/validation-* - config_name: babylon-prob-02 data_files: - split: train path: babylon-prob-02/train-* - split: test path: babylon-prob-02/test-* - split: validation path: babylon-prob-02/validation-* - config_name: babylon-prob-03 data_files: - split: train path: babylon-prob-03/train-* - split: test path: babylon-prob-03/test-* - split: validation path: babylon-prob-03/validation-* - config_name: babylon-prob-04 data_files: - split: train path: babylon-prob-04/train-* - split: test path: babylon-prob-04/test-* - split: validation path: babylon-prob-04/validation-* - config_name: babylon-prob-05 data_files: - split: train path: babylon-prob-05/train-* - split: test path: babylon-prob-05/test-* - split: validation path: babylon-prob-05/validation-* - config_name: gcd-prob-01 data_files: - split: train path: gcd-prob-01/train-* - split: test path: gcd-prob-01/test-* - split: validation path: gcd-prob-01/validation-* - config_name: gcd-prob-02 data_files: - split: train path: gcd-prob-02/train-* - split: test path: gcd-prob-02/test-* - split: validation path: gcd-prob-02/validation-* - config_name: gcd-prob-03 data_files: - split: train path: gcd-prob-03/train-* - split: test path: gcd-prob-03/test-* - split: validation path: gcd-prob-03/validation-* - config_name: gcd-prob-04 data_files: - split: train path: gcd-prob-04/train-* - split: test path: gcd-prob-04/test-* - split: validation path: gcd-prob-04/validation-* - config_name: gcd-prob-05 data_files: - split: train path: gcd-prob-05/train-* - split: test path: gcd-prob-05/test-* - split: validation path: gcd-prob-05/validation-* - config_name: kaggle-prob-01 data_files: - split: train path: kaggle-prob-01/train-* - split: test path: kaggle-prob-01/test-* - split: validation path: kaggle-prob-01/validation-* - config_name: kaggle-prob-02 data_files: - split: train path: kaggle-prob-02/train-* - split: test path: kaggle-prob-02/test-* - split: validation path: kaggle-prob-02/validation-* - config_name: kaggle-prob-03 data_files: - split: train path: kaggle-prob-03/train-* - split: test path: kaggle-prob-03/test-* - split: validation path: kaggle-prob-03/validation-* - config_name: kaggle-prob-04 data_files: - split: train path: kaggle-prob-04/train-* - split: test path: kaggle-prob-04/test-* - split: validation path: kaggle-prob-04/validation-* - config_name: kaggle-prob-05 data_files: - split: train path: kaggle-prob-05/train-* - split: test path: kaggle-prob-05/test-* - split: validation path: kaggle-prob-05/validation-* ---
提供机构:
gayanin
原始信息汇总

数据集概述

数据集配置

babylon-prob-01

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 6298703 字节, 24908 样本
    • test: 794582 字节, 3113 样本
    • validation: 784437 字节, 3114 样本
  • 下载大小: 4438345 字节
  • 数据集大小: 7877722 字节

babylon-prob-02

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 6131860 字节, 24908 样本
    • test: 772976 字节, 3113 样本
    • validation: 763170 字节, 3114 样本
  • 下载大小: 4431105 字节
  • 数据集大小: 7668006 字节

babylon-prob-03

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 5963382 字节, 24908 样本
    • test: 751530 字节, 3113 样本
    • validation: 743139 字节, 3114 样本
  • 下载大小: 4411104 字节
  • 数据集大小: 7458051 字节

babylon-prob-04

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 5794478 字节, 24908 样本
    • test: 730929 字节, 3113 样本
    • validation: 720849 字节, 3114 样本
  • 下载大小: 4374101 字节
  • 数据集大小: 7246256 字节

babylon-prob-05

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 5634718 字节, 24908 样本
    • test: 708651 字节, 3113 样本
    • validation: 701862 字节, 3114 样本
  • 下载大小: 4336094 字节
  • 数据集大小: 7045231 字节

gcd-prob-01

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 5623412 字节, 24908 样本
    • test: 774353 字节, 3114 样本
    • validation: 772363 字节, 3114 样本
  • 下载大小: 4026552 字节
  • 数据集大小: 7170128 字节

gcd-prob-02

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 5386733 字节, 24908 样本
    • test: 742236 字节, 3114 样本
    • validation: 739965 字节, 3114 样本
  • 下载大小: 3926230 字节
  • 数据集大小: 6868934 字节

gcd-prob-03

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 5151749 字节, 24908 样本
    • test: 709209 字节, 3114 样本
    • validation: 706547 字节, 3114 样本
  • 下载大小: 3806924 字节
  • 数据集大小: 6567505 字节

gcd-prob-04

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 4914469 字节, 24908 样本
    • test: 678027 字节, 3114 样本
    • validation: 676635 字节, 3114 样本
  • 下载大小: 3674828 字节
  • 数据集大小: 6269131 字节

gcd-prob-05

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 4682536 字节, 24908 样本
    • test: 643943 字节, 3114 样本
    • validation: 644068 字节, 3114 样本
  • 下载大小: 3536779 字节
  • 数据集大小: 5970547 字节

kaggle-prob-01

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 6254746 字节, 24908 样本
    • test: 787330 字节, 3113 样本
    • validation: 783533 字节, 3114 样本
  • 下载大小: 4393817 字节
  • 数据集大小: 7825609 字节

kaggle-prob-02

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 6002616 字节, 24908 样本
    • test: 753845 字节, 3113 样本
    • validation: 751722 字节, 3114 样本
  • 下载大小: 4291924 字节
  • 数据集大小: 7508183 字节

kaggle-prob-03

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 5747484 字节, 24908 样本
    • test: 722481 字节, 3113 样本
    • validation: 719629 字节, 3114 样本
  • 下载大小: 4175521 字节
  • 数据集大小: 7189594 字节

kaggle-prob-04

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 5496897 字节, 24908 样本
    • test: 692009 字节, 3113 样本
    • validation: 688458 字节, 3114 样本
  • 下载大小: 4054340 字节
  • 数据集大小: 6877364 字节

kaggle-prob-05

  • 特征:
    • refs: string
    • trans: string
  • 分割:
    • train: 5243270 字节, 24908 样本
    • test: 658650 字节, 3113 样本
    • validation: 658178 字节, 3114 样本
  • 下载大小: 3911586 字节
  • 数据集大小: 6560098 字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作