allenai/layout_distribution_shift
收藏Hugging Face2023-05-24 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/allenai/layout_distribution_shift
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
dataset_info:
features:
- name: words
sequence: string
- name: bbox
sequence:
sequence: float64
- name: labels
sequence: int64
- name: block_ids
sequence: int64
- name: line_ids
sequence: int64
- name: files
dtype: string
splits:
- name: remapped_Acta_dev.json
num_bytes: 9101699
num_examples: 491
- name: remapped_Acta_fewshot_finetune_10_pubs_dev_episode_0.json
num_bytes: 27958
num_examples: 2
- name: remapped_Acta_fewshot_finetune_10_pubs_dev_episode_1.json
num_bytes: 18241
num_examples: 2
- name: remapped_Acta_fewshot_finetune_10_pubs_dev_episode_2.json
num_bytes: 45036
num_examples: 2
- name: remapped_Acta_fewshot_finetune_10_pubs_train_episode_0.json
num_bytes: 2269140
num_examples: 117
- name: remapped_Acta_fewshot_finetune_10_pubs_train_episode_1.json
num_bytes: 2011417
num_examples: 102
- name: remapped_Acta_fewshot_finetune_10_pubs_train_episode_2.json
num_bytes: 2236354
num_examples: 116
- name: remapped_Acta_test.json
num_bytes: 9450719
num_examples: 495
- name: remapped_Acta_train.json
num_bytes: 71764609
num_examples: 3848
- name: remapped_BMC_dev.json
num_bytes: 23369323
num_examples: 503
- name: remapped_BMC_fewshot_finetune_10_pubs_dev_episode_0.json
num_bytes: 108560
num_examples: 2
- name: remapped_BMC_fewshot_finetune_10_pubs_dev_episode_1.json
num_bytes: 67630
num_examples: 2
- name: remapped_BMC_fewshot_finetune_10_pubs_dev_episode_2.json
num_bytes: 74671
num_examples: 2
- name: remapped_BMC_fewshot_finetune_10_pubs_train_episode_0.json
num_bytes: 3696565
num_examples: 82
- name: remapped_BMC_fewshot_finetune_10_pubs_train_episode_1.json
num_bytes: 3831159
num_examples: 77
- name: remapped_BMC_fewshot_finetune_10_pubs_train_episode_2.json
num_bytes: 4578916
num_examples: 96
- name: remapped_BMC_test.json
num_bytes: 25850198
num_examples: 535
- name: remapped_BMC_train.json
num_bytes: 216531051
num_examples: 4628
- name: remapped_PLoS_dev.json
num_bytes: 78334040
num_examples: 1499
- name: remapped_PLoS_fewshot_finetune_10_pubs_dev_episode_0.json
num_bytes: 93335
num_examples: 2
- name: remapped_PLoS_fewshot_finetune_10_pubs_dev_episode_1.json
num_bytes: 125366
num_examples: 2
- name: remapped_PLoS_fewshot_finetune_10_pubs_dev_episode_2.json
num_bytes: 126234
num_examples: 2
- name: remapped_PLoS_fewshot_finetune_10_pubs_train_episode_0.json
num_bytes: 6190119
num_examples: 120
- name: remapped_PLoS_fewshot_finetune_10_pubs_train_episode_1.json
num_bytes: 5238068
num_examples: 98
- name: remapped_PLoS_fewshot_finetune_10_pubs_train_episode_2.json
num_bytes: 5662127
num_examples: 121
- name: remapped_PLoS_test.json
num_bytes: 77843621
num_examples: 1480
- name: remapped_PLoS_train.json
num_bytes: 622303242
num_examples: 11937
- name: remapped_RU_dev.json
num_bytes: 37618273
num_examples: 689
- name: remapped_RU_fewshot_finetune_10_pubs_dev_episode_0.json
num_bytes: 140245
num_examples: 2
- name: remapped_RU_fewshot_finetune_10_pubs_dev_episode_1.json
num_bytes: 135845
num_examples: 2
- name: remapped_RU_fewshot_finetune_10_pubs_dev_episode_2.json
num_bytes: 153598
num_examples: 2
- name: remapped_RU_fewshot_finetune_10_pubs_train_episode_0.json
num_bytes: 6575257
num_examples: 116
- name: remapped_RU_fewshot_finetune_10_pubs_train_episode_1.json
num_bytes: 5998010
num_examples: 105
- name: remapped_RU_fewshot_finetune_10_pubs_train_episode_2.json
num_bytes: 5014176
num_examples: 99
- name: remapped_RU_test.json
num_bytes: 36500742
num_examples: 665
- name: remapped_RU_train.json
num_bytes: 297906664
num_examples: 5452
- name: remapped_diverse_publications_125_publishers_dev.json
num_bytes: 26129574
num_examples: 493
- name: remapped_diverse_publications_125_publishers_train.json
num_bytes: 628804969
num_examples: 13002
- name: remapped_diverse_publications_25_publishers_dev.json
num_bytes: 30070714
num_examples: 606
- name: remapped_diverse_publications_25_publishers_train.json
num_bytes: 675457461
num_examples: 13538
download_size: 442657892
dataset_size: 2921454926
---
提供机构:
allenai
原始信息汇总
数据集概述
数据集特征
- words: 字符串序列
- bbox: 浮点数序列序列
- labels: 整数序列
- block_ids: 整数序列
- line_ids: 整数序列
- files: 字符串类型
数据集拆分
-
remapped_Acta系列:
- dev: 491个样本,9101699字节
- test: 495个样本,9450719字节
- train: 3848个样本,71764609字节
- fewshot_finetune_10_pubs_dev_episode: 3个文件,每个文件2个样本
- fewshot_finetune_10_pubs_train_episode: 3个文件,样本数分别为117, 102, 116
-
remapped_BMC系列:
- dev: 503个样本,23369323字节
- test: 535个样本,25850198字节
- train: 4628个样本,216531051字节
- fewshot_finetune_10_pubs_dev_episode: 3个文件,每个文件2个样本
- fewshot_finetune_10_pubs_train_episode: 3个文件,样本数分别为82, 77, 96
-
remapped_PLoS系列:
- dev: 1499个样本,78334040字节
- test: 1480个样本,77843621字节
- train: 11937个样本,622303242字节
- fewshot_finetune_10_pubs_dev_episode: 3个文件,每个文件2个样本
- fewshot_finetune_10_pubs_train_episode: 3个文件,样本数分别为120, 98, 121
-
remapped_RU系列:
- dev: 689个样本,37618273字节
- test: 665个样本,36500742字节
- train: 5452个样本,297906664字节
- fewshot_finetune_10_pubs_dev_episode: 3个文件,每个文件2个样本
- fewshot_finetune_10_pubs_train_episode: 3个文件,样本数分别为116, 105, 99
-
remapped_diverse_publications系列:
- 125_publishers_dev: 493个样本,26129574字节
- 125_publishers_train: 13002个样本,628804969字节
- 25_publishers_dev: 606个样本,30070714字节
- 25_publishers_train: 13538个样本,675457461字节
数据集大小
- 下载大小: 442657892字节
- 数据集大小: 2921454926字节



