gayanin/pubmed-abstracts-noised-with-gcd-dist
收藏Hugging Face2024-02-07 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/gayanin/pubmed-abstracts-noised-with-gcd-dist
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: prob-01
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 18059111
num_examples: 74724
- name: test
num_bytes: 2313240
num_examples: 9341
- name: validation
num_bytes: 2377221
num_examples: 9341
download_size: 12720153
dataset_size: 22749572
- config_name: prob-02
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 17303468
num_examples: 74724
- name: test
num_bytes: 2215701
num_examples: 9341
- name: validation
num_bytes: 2278477
num_examples: 9341
download_size: 12401406
dataset_size: 21797646
- config_name: prob-03
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 16548709
num_examples: 74724
- name: test
num_bytes: 2119313
num_examples: 9341
- name: validation
num_bytes: 2180352
num_examples: 9341
download_size: 12046866
dataset_size: 20848374
- config_name: prob-04
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 15796152
num_examples: 74724
- name: test
num_bytes: 2023161
num_examples: 9341
- name: validation
num_bytes: 2076457
num_examples: 9341
download_size: 11644890
dataset_size: 19895770
- config_name: prob-05
features:
- name: refs
dtype: string
- name: trans
dtype: string
splits:
- name: train
num_bytes: 15033370
num_examples: 74724
- name: test
num_bytes: 1927033
num_examples: 9341
- name: validation
num_bytes: 1984387
num_examples: 9341
download_size: 11205650
dataset_size: 18944790
configs:
- config_name: prob-01
data_files:
- split: train
path: prob-01/train-*
- split: test
path: prob-01/test-*
- split: validation
path: prob-01/validation-*
- config_name: prob-02
data_files:
- split: train
path: prob-02/train-*
- split: test
path: prob-02/test-*
- split: validation
path: prob-02/validation-*
- config_name: prob-03
data_files:
- split: train
path: prob-03/train-*
- split: test
path: prob-03/test-*
- split: validation
path: prob-03/validation-*
- config_name: prob-04
data_files:
- split: train
path: prob-04/train-*
- split: test
path: prob-04/test-*
- split: validation
path: prob-04/validation-*
- config_name: prob-05
data_files:
- split: train
path: prob-05/train-*
- split: test
path: prob-05/test-*
- split: validation
path: prob-05/validation-*
---
提供机构:
gayanin
原始信息汇总
数据集概述
数据集配置
prob-01
- 特征:
refs: 字符串类型trans: 字符串类型
- 分割:
train: 18059111 字节, 74724 个样本test: 2313240 字节, 9341 个样本validation: 2377221 字节, 9341 个样本
- 下载大小: 12720153 字节
- 数据集大小: 22749572 字节
prob-02
- 特征:
refs: 字符串类型trans: 字符串类型
- 分割:
train: 17303468 字节, 74724 个样本test: 2215701 字节, 9341 个样本validation: 2278477 字节, 9341 个样本
- 下载大小: 12401406 字节
- 数据集大小: 21797646 字节
prob-03
- 特征:
refs: 字符串类型trans: 字符串类型
- 分割:
train: 16548709 字节, 74724 个样本test: 2119313 字节, 9341 个样本validation: 2180352 字节, 9341 个样本
- 下载大小: 12046866 字节
- 数据集大小: 20848374 字节
prob-04
- 特征:
refs: 字符串类型trans: 字符串类型
- 分割:
train: 15796152 字节, 74724 个样本test: 2023161 字节, 9341 个样本validation: 2076457 字节, 9341 个样本
- 下载大小: 11644890 字节
- 数据集大小: 19895770 字节
prob-05
- 特征:
refs: 字符串类型trans: 字符串类型
- 分割:
train: 15033370 字节, 74724 个样本test: 1927033 字节, 9341 个样本validation: 1984387 字节, 9341 个样本
- 下载大小: 11205650 字节
- 数据集大小: 18944790 字节
数据文件路径
- prob-01:
train: prob-01/train-*test: prob-01/test-*validation: prob-01/validation-*
- prob-02:
train: prob-02/train-*test: prob-02/test-*validation: prob-02/validation-*
- prob-03:
train: prob-03/train-*test: prob-03/test-*validation: prob-03/validation-*
- prob-04:
train: prob-04/train-*test: prob-04/test-*validation: prob-04/validation-*
- prob-05:
train: prob-05/train-*test: prob-05/test-*validation: prob-05/validation-*



