five

allenai/layout_distribution_shift

收藏
Hugging Face2023-05-24 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/allenai/layout_distribution_shift
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 dataset_info: features: - name: words sequence: string - name: bbox sequence: sequence: float64 - name: labels sequence: int64 - name: block_ids sequence: int64 - name: line_ids sequence: int64 - name: files dtype: string splits: - name: remapped_Acta_dev.json num_bytes: 9101699 num_examples: 491 - name: remapped_Acta_fewshot_finetune_10_pubs_dev_episode_0.json num_bytes: 27958 num_examples: 2 - name: remapped_Acta_fewshot_finetune_10_pubs_dev_episode_1.json num_bytes: 18241 num_examples: 2 - name: remapped_Acta_fewshot_finetune_10_pubs_dev_episode_2.json num_bytes: 45036 num_examples: 2 - name: remapped_Acta_fewshot_finetune_10_pubs_train_episode_0.json num_bytes: 2269140 num_examples: 117 - name: remapped_Acta_fewshot_finetune_10_pubs_train_episode_1.json num_bytes: 2011417 num_examples: 102 - name: remapped_Acta_fewshot_finetune_10_pubs_train_episode_2.json num_bytes: 2236354 num_examples: 116 - name: remapped_Acta_test.json num_bytes: 9450719 num_examples: 495 - name: remapped_Acta_train.json num_bytes: 71764609 num_examples: 3848 - name: remapped_BMC_dev.json num_bytes: 23369323 num_examples: 503 - name: remapped_BMC_fewshot_finetune_10_pubs_dev_episode_0.json num_bytes: 108560 num_examples: 2 - name: remapped_BMC_fewshot_finetune_10_pubs_dev_episode_1.json num_bytes: 67630 num_examples: 2 - name: remapped_BMC_fewshot_finetune_10_pubs_dev_episode_2.json num_bytes: 74671 num_examples: 2 - name: remapped_BMC_fewshot_finetune_10_pubs_train_episode_0.json num_bytes: 3696565 num_examples: 82 - name: remapped_BMC_fewshot_finetune_10_pubs_train_episode_1.json num_bytes: 3831159 num_examples: 77 - name: remapped_BMC_fewshot_finetune_10_pubs_train_episode_2.json num_bytes: 4578916 num_examples: 96 - name: remapped_BMC_test.json num_bytes: 25850198 num_examples: 535 - name: remapped_BMC_train.json num_bytes: 216531051 num_examples: 4628 - name: remapped_PLoS_dev.json num_bytes: 78334040 num_examples: 1499 - name: remapped_PLoS_fewshot_finetune_10_pubs_dev_episode_0.json num_bytes: 93335 num_examples: 2 - name: remapped_PLoS_fewshot_finetune_10_pubs_dev_episode_1.json num_bytes: 125366 num_examples: 2 - name: remapped_PLoS_fewshot_finetune_10_pubs_dev_episode_2.json num_bytes: 126234 num_examples: 2 - name: remapped_PLoS_fewshot_finetune_10_pubs_train_episode_0.json num_bytes: 6190119 num_examples: 120 - name: remapped_PLoS_fewshot_finetune_10_pubs_train_episode_1.json num_bytes: 5238068 num_examples: 98 - name: remapped_PLoS_fewshot_finetune_10_pubs_train_episode_2.json num_bytes: 5662127 num_examples: 121 - name: remapped_PLoS_test.json num_bytes: 77843621 num_examples: 1480 - name: remapped_PLoS_train.json num_bytes: 622303242 num_examples: 11937 - name: remapped_RU_dev.json num_bytes: 37618273 num_examples: 689 - name: remapped_RU_fewshot_finetune_10_pubs_dev_episode_0.json num_bytes: 140245 num_examples: 2 - name: remapped_RU_fewshot_finetune_10_pubs_dev_episode_1.json num_bytes: 135845 num_examples: 2 - name: remapped_RU_fewshot_finetune_10_pubs_dev_episode_2.json num_bytes: 153598 num_examples: 2 - name: remapped_RU_fewshot_finetune_10_pubs_train_episode_0.json num_bytes: 6575257 num_examples: 116 - name: remapped_RU_fewshot_finetune_10_pubs_train_episode_1.json num_bytes: 5998010 num_examples: 105 - name: remapped_RU_fewshot_finetune_10_pubs_train_episode_2.json num_bytes: 5014176 num_examples: 99 - name: remapped_RU_test.json num_bytes: 36500742 num_examples: 665 - name: remapped_RU_train.json num_bytes: 297906664 num_examples: 5452 - name: remapped_diverse_publications_125_publishers_dev.json num_bytes: 26129574 num_examples: 493 - name: remapped_diverse_publications_125_publishers_train.json num_bytes: 628804969 num_examples: 13002 - name: remapped_diverse_publications_25_publishers_dev.json num_bytes: 30070714 num_examples: 606 - name: remapped_diverse_publications_25_publishers_train.json num_bytes: 675457461 num_examples: 13538 download_size: 442657892 dataset_size: 2921454926 ---
提供机构:
allenai
原始信息汇总

数据集概述

数据集特征

  • words: 字符串序列
  • bbox: 浮点数序列序列
  • labels: 整数序列
  • block_ids: 整数序列
  • line_ids: 整数序列
  • files: 字符串类型

数据集拆分

  • remapped_Acta系列:

    • dev: 491个样本,9101699字节
    • test: 495个样本,9450719字节
    • train: 3848个样本,71764609字节
    • fewshot_finetune_10_pubs_dev_episode: 3个文件,每个文件2个样本
    • fewshot_finetune_10_pubs_train_episode: 3个文件,样本数分别为117, 102, 116
  • remapped_BMC系列:

    • dev: 503个样本,23369323字节
    • test: 535个样本,25850198字节
    • train: 4628个样本,216531051字节
    • fewshot_finetune_10_pubs_dev_episode: 3个文件,每个文件2个样本
    • fewshot_finetune_10_pubs_train_episode: 3个文件,样本数分别为82, 77, 96
  • remapped_PLoS系列:

    • dev: 1499个样本,78334040字节
    • test: 1480个样本,77843621字节
    • train: 11937个样本,622303242字节
    • fewshot_finetune_10_pubs_dev_episode: 3个文件,每个文件2个样本
    • fewshot_finetune_10_pubs_train_episode: 3个文件,样本数分别为120, 98, 121
  • remapped_RU系列:

    • dev: 689个样本,37618273字节
    • test: 665个样本,36500742字节
    • train: 5452个样本,297906664字节
    • fewshot_finetune_10_pubs_dev_episode: 3个文件,每个文件2个样本
    • fewshot_finetune_10_pubs_train_episode: 3个文件,样本数分别为116, 105, 99
  • remapped_diverse_publications系列:

    • 125_publishers_dev: 493个样本,26129574字节
    • 125_publishers_train: 13002个样本,628804969字节
    • 25_publishers_dev: 606个样本,30070714字节
    • 25_publishers_train: 13538个样本,675457461字节

数据集大小

  • 下载大小: 442657892字节
  • 数据集大小: 2921454926字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作