five

fbaigt/schema-to-json

收藏
Hugging Face2023-11-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/fbaigt/schema-to-json
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: gpl-3.0 configs: - config_name: chemtables data_files: - split: train path: chemtables/train-* - split: validation path: chemtables/validation-* - split: test path: chemtables/test-* - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* - split: test path: data/test-* - config_name: discomat data_files: - split: train path: discomat/train-* - split: validation path: discomat/validation-* - split: test path: discomat/test-* - config_name: mltables data_files: - split: train path: mltables/train-* - split: validation path: mltables/validation-* - split: test path: mltables/test-* dataset_info: - config_name: chemtables features: - name: paper_id dtype: string - name: table_id dtype: string - name: table_code dtype: string - name: sup_text dtype: string - name: target_cells sequence: - name: cell_value dtype: string - name: cell_raw dtype: string - name: cell_index dtype: string - name: cell_row_idx dtype: int32 - name: cell_col_idx dtype: int32 - name: gold_json_records sequence: - name: cell_index dtype: string - name: cell_record dtype: string splits: - name: train num_bytes: 92180 num_examples: 9 - name: validation num_bytes: 39374 num_examples: 3 - name: test num_bytes: 117148 num_examples: 14 download_size: 124818 dataset_size: 248702 - config_name: default features: - name: paper_id dtype: string - name: table_id dtype: string - name: table_code dtype: string - name: sup_text dtype: string - name: target_cells sequence: - name: cell_value dtype: string - name: cell_raw dtype: string - name: cell_index dtype: string - name: cell_row_idx dtype: int32 - name: cell_col_idx dtype: int32 - name: gold_json_records sequence: - name: cell_index dtype: string - name: cell_record dtype: string splits: - name: train num_bytes: 78484 num_examples: 9 - name: validation num_bytes: 37457 num_examples: 3 - name: test num_bytes: 113119 num_examples: 14 download_size: 122465 dataset_size: 229060 - config_name: discomat features: - name: paper_id dtype: string - name: table_id dtype: string - name: table_code dtype: string - name: sup_text dtype: string - name: target_cells sequence: - name: cell_value_processed dtype: string - name: i dtype: int32 - name: j dtype: int32 - name: k dtype: int32 - name: gold_json_records sequence: - name: cell_index sequence: int32 length: 3 - name: cell_record dtype: string splits: - name: train num_bytes: 2300237 num_examples: 500 - name: validation num_bytes: 2300237 num_examples: 500 - name: test num_bytes: 2366158 num_examples: 487 download_size: 1430344 dataset_size: 6966632 - config_name: mltables features: - name: paper_id dtype: string - name: table_id dtype: string - name: table_code dtype: string - name: sup_text dtype: string - name: target_cells sequence: - name: cell_value dtype: string - name: cell_raw dtype: string - name: cell_value_char_idx_start dtype: int32 - name: cell_value_char_idx_end dtype: int32 - name: cell_raw_char_idx_start dtype: int32 - name: cell_raw_char_idx_end dtype: int32 - name: gold_json_records sequence: - name: cell_char_index sequence: int32 length: 2 - name: cell_record dtype: string splits: - name: train num_bytes: 696651 num_examples: 43 - name: validation num_bytes: 150816 num_examples: 11 - name: test num_bytes: 1248693 num_examples: 68 download_size: 605737 dataset_size: 2096160 ---
提供机构:
fbaigt
原始信息汇总

数据集概述

许可证

  • GPL-3.0

配置信息

chemtables

  • 数据文件路径
    • 训练集: chemtables/train-*
    • 验证集: chemtables/validation-*
    • 测试集: chemtables/test-*
  • 特征
    • paper_id: 字符串
    • table_id: 字符串
    • table_code: 字符串
    • sup_text: 字符串
    • target_cells: 序列
      • cell_value: 字符串
      • cell_raw: 字符串
      • cell_index: 字符串
      • cell_row_idx: 整数32位
      • cell_col_idx: 整数32位
    • gold_json_records: 序列
      • cell_index: 字符串
      • cell_record: 字符串
  • 分割信息
    • 训练集: 92180字节, 9个样本
    • 验证集: 39374字节, 3个样本
    • 测试集: 117148字节, 14个样本
  • 下载大小: 124818字节
  • 数据集大小: 248702字节

default

  • 数据文件路径
    • 训练集: data/train-*
    • 验证集: data/validation-*
    • 测试集: data/test-*
  • 特征
    • paper_id: 字符串
    • table_id: 字符串
    • table_code: 字符串
    • sup_text: 字符串
    • target_cells: 序列
      • cell_value: 字符串
      • cell_raw: 字符串
      • cell_index: 字符串
      • cell_row_idx: 整数32位
      • cell_col_idx: 整数32位
    • gold_json_records: 序列
      • cell_index: 字符串
      • cell_record: 字符串
  • 分割信息
    • 训练集: 78484字节, 9个样本
    • 验证集: 37457字节, 3个样本
    • 测试集: 113119字节, 14个样本
  • 下载大小: 122465字节
  • 数据集大小: 229060字节

discomat

  • 数据文件路径
    • 训练集: discomat/train-*
    • 验证集: discomat/validation-*
    • 测试集: discomat/test-*
  • 特征
    • paper_id: 字符串
    • table_id: 字符串
    • table_code: 字符串
    • sup_text: 字符串
    • target_cells: 序列
      • cell_value_processed: 字符串
      • i: 整数32位
      • j: 整数32位
      • k: 整数32位
    • gold_json_records: 序列
      • cell_index: 序列, 整数32位, 长度3
      • cell_record: 字符串
  • 分割信息
    • 训练集: 2300237字节, 500个样本
    • 验证集: 2300237字节, 500个样本
    • 测试集: 2366158字节, 487个样本
  • 下载大小: 1430344字节
  • 数据集大小: 6966632字节

mltables

  • 数据文件路径
    • 训练集: mltables/train-*
    • 验证集: mltables/validation-*
    • 测试集: mltables/test-*
  • 特征
    • paper_id: 字符串
    • table_id: 字符串
    • table_code: 字符串
    • sup_text: 字符串
    • target_cells: 序列
      • cell_value: 字符串
      • cell_raw: 字符串
      • cell_value_char_idx_start: 整数32位
      • cell_value_char_idx_end: 整数32位
      • cell_raw_char_idx_start: 整数32位
      • cell_raw_char_idx_end: 整数32位
    • gold_json_records: 序列
      • cell_char_index: 序列, 整数32位, 长度2
      • cell_record: 字符串
  • 分割信息
    • 训练集: 696651字节, 43个样本
    • 验证集: 150816字节, 11个样本
    • 测试集: 1248693字节, 68个样本
  • 下载大小: 605737字节
  • 数据集大小: 2096160字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作