fbaigt/schema-to-json
收藏Hugging Face2023-11-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/fbaigt/schema-to-json
下载链接
链接失效反馈官方服务:
资源简介:
---
license: gpl-3.0
configs:
- config_name: chemtables
data_files:
- split: train
path: chemtables/train-*
- split: validation
path: chemtables/validation-*
- split: test
path: chemtables/test-*
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
- config_name: discomat
data_files:
- split: train
path: discomat/train-*
- split: validation
path: discomat/validation-*
- split: test
path: discomat/test-*
- config_name: mltables
data_files:
- split: train
path: mltables/train-*
- split: validation
path: mltables/validation-*
- split: test
path: mltables/test-*
dataset_info:
- config_name: chemtables
features:
- name: paper_id
dtype: string
- name: table_id
dtype: string
- name: table_code
dtype: string
- name: sup_text
dtype: string
- name: target_cells
sequence:
- name: cell_value
dtype: string
- name: cell_raw
dtype: string
- name: cell_index
dtype: string
- name: cell_row_idx
dtype: int32
- name: cell_col_idx
dtype: int32
- name: gold_json_records
sequence:
- name: cell_index
dtype: string
- name: cell_record
dtype: string
splits:
- name: train
num_bytes: 92180
num_examples: 9
- name: validation
num_bytes: 39374
num_examples: 3
- name: test
num_bytes: 117148
num_examples: 14
download_size: 124818
dataset_size: 248702
- config_name: default
features:
- name: paper_id
dtype: string
- name: table_id
dtype: string
- name: table_code
dtype: string
- name: sup_text
dtype: string
- name: target_cells
sequence:
- name: cell_value
dtype: string
- name: cell_raw
dtype: string
- name: cell_index
dtype: string
- name: cell_row_idx
dtype: int32
- name: cell_col_idx
dtype: int32
- name: gold_json_records
sequence:
- name: cell_index
dtype: string
- name: cell_record
dtype: string
splits:
- name: train
num_bytes: 78484
num_examples: 9
- name: validation
num_bytes: 37457
num_examples: 3
- name: test
num_bytes: 113119
num_examples: 14
download_size: 122465
dataset_size: 229060
- config_name: discomat
features:
- name: paper_id
dtype: string
- name: table_id
dtype: string
- name: table_code
dtype: string
- name: sup_text
dtype: string
- name: target_cells
sequence:
- name: cell_value_processed
dtype: string
- name: i
dtype: int32
- name: j
dtype: int32
- name: k
dtype: int32
- name: gold_json_records
sequence:
- name: cell_index
sequence: int32
length: 3
- name: cell_record
dtype: string
splits:
- name: train
num_bytes: 2300237
num_examples: 500
- name: validation
num_bytes: 2300237
num_examples: 500
- name: test
num_bytes: 2366158
num_examples: 487
download_size: 1430344
dataset_size: 6966632
- config_name: mltables
features:
- name: paper_id
dtype: string
- name: table_id
dtype: string
- name: table_code
dtype: string
- name: sup_text
dtype: string
- name: target_cells
sequence:
- name: cell_value
dtype: string
- name: cell_raw
dtype: string
- name: cell_value_char_idx_start
dtype: int32
- name: cell_value_char_idx_end
dtype: int32
- name: cell_raw_char_idx_start
dtype: int32
- name: cell_raw_char_idx_end
dtype: int32
- name: gold_json_records
sequence:
- name: cell_char_index
sequence: int32
length: 2
- name: cell_record
dtype: string
splits:
- name: train
num_bytes: 696651
num_examples: 43
- name: validation
num_bytes: 150816
num_examples: 11
- name: test
num_bytes: 1248693
num_examples: 68
download_size: 605737
dataset_size: 2096160
---
提供机构:
fbaigt
原始信息汇总
数据集概述
许可证
- GPL-3.0
配置信息
chemtables
- 数据文件路径
- 训练集:
chemtables/train-* - 验证集:
chemtables/validation-* - 测试集:
chemtables/test-*
- 训练集:
- 特征
paper_id: 字符串table_id: 字符串table_code: 字符串sup_text: 字符串target_cells: 序列cell_value: 字符串cell_raw: 字符串cell_index: 字符串cell_row_idx: 整数32位cell_col_idx: 整数32位
gold_json_records: 序列cell_index: 字符串cell_record: 字符串
- 分割信息
- 训练集: 92180字节, 9个样本
- 验证集: 39374字节, 3个样本
- 测试集: 117148字节, 14个样本
- 下载大小: 124818字节
- 数据集大小: 248702字节
default
- 数据文件路径
- 训练集:
data/train-* - 验证集:
data/validation-* - 测试集:
data/test-*
- 训练集:
- 特征
paper_id: 字符串table_id: 字符串table_code: 字符串sup_text: 字符串target_cells: 序列cell_value: 字符串cell_raw: 字符串cell_index: 字符串cell_row_idx: 整数32位cell_col_idx: 整数32位
gold_json_records: 序列cell_index: 字符串cell_record: 字符串
- 分割信息
- 训练集: 78484字节, 9个样本
- 验证集: 37457字节, 3个样本
- 测试集: 113119字节, 14个样本
- 下载大小: 122465字节
- 数据集大小: 229060字节
discomat
- 数据文件路径
- 训练集:
discomat/train-* - 验证集:
discomat/validation-* - 测试集:
discomat/test-*
- 训练集:
- 特征
paper_id: 字符串table_id: 字符串table_code: 字符串sup_text: 字符串target_cells: 序列cell_value_processed: 字符串i: 整数32位j: 整数32位k: 整数32位
gold_json_records: 序列cell_index: 序列, 整数32位, 长度3cell_record: 字符串
- 分割信息
- 训练集: 2300237字节, 500个样本
- 验证集: 2300237字节, 500个样本
- 测试集: 2366158字节, 487个样本
- 下载大小: 1430344字节
- 数据集大小: 6966632字节
mltables
- 数据文件路径
- 训练集:
mltables/train-* - 验证集:
mltables/validation-* - 测试集:
mltables/test-*
- 训练集:
- 特征
paper_id: 字符串table_id: 字符串table_code: 字符串sup_text: 字符串target_cells: 序列cell_value: 字符串cell_raw: 字符串cell_value_char_idx_start: 整数32位cell_value_char_idx_end: 整数32位cell_raw_char_idx_start: 整数32位cell_raw_char_idx_end: 整数32位
gold_json_records: 序列cell_char_index: 序列, 整数32位, 长度2cell_record: 字符串
- 分割信息
- 训练集: 696651字节, 43个样本
- 验证集: 150816字节, 11个样本
- 测试集: 1248693字节, 68个样本
- 下载大小: 605737字节
- 数据集大小: 2096160字节



