TwinDoc/funsd
收藏Hugging Face2024-05-20 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/TwinDoc/funsd
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: id
dtype: string
- name: words
sequence: string
- name: bboxes
sequence:
sequence: int64
- name: ner_tags
sequence:
class_label:
names:
'0': O
'1': B-HEADER
'2': I-HEADER
'3': B-QUESTION
'4': I-QUESTION
'5': B-ANSWER
'6': I-ANSWER
- name: image_path
dtype: string
- name: image
dtype: image
- name: bbox_id
sequence: int64
- name: words_sort_tlbr
sequence: string
- name: bboxes_sort_tlbr
sequence:
sequence: int64
- name: ner_tags_sort_tlbr
sequence: int64
- name: bbox_id__sort_tlbr
sequence: int64
splits:
- name: train
num_bytes: 14725225.0
num_examples: 149
- name: test
num_bytes: 5358339.0
num_examples: 50
- name: test_10
num_bytes: 1071658.2
num_examples: 10
download_size: 18253145
dataset_size: 21155222.2
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
- split: test_10
path: data/test_10-*
---
提供机构:
TwinDoc
原始信息汇总
数据集概述
数据集特征
- id: 字符串类型
- words: 字符串序列类型
- bboxes: 整数序列序列类型
- ner_tags: 类别标签序列类型,标签名包括:
- 0: O
- 1: B-HEADER
- 2: I-HEADER
- 3: B-QUESTION
- 4: I-QUESTION
- 5: B-ANSWER
- 6: I-ANSWER
- image_path: 字符串类型
- image: 图像类型
- bbox_id: 整数序列类型
- words_sort_tlbr: 字符串序列类型
- bboxes_sort_tlbr: 整数序列序列类型
- ner_tags_sort_tlbr: 整数序列类型
- bbox_id__sort_tlbr: 整数序列类型
数据集分割
- train: 149个样本,占用14725225字节
- test: 50个样本,占用5358339字节
- test_10: 10个样本,占用1071658.2字节
数据集大小
- 下载大小: 18253145字节
- 数据集总大小: 21155222.2字节
配置文件
- default 配置:
- 训练数据路径:
data/train-* - 测试数据路径:
data/test-* - 测试数据10路径:
data/test_10-*
- 训练数据路径:



