EphronM/usa_passports_ocr_funsd
收藏Hugging Face2024-04-17 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/EphronM/usa_passports_ocr_funsd
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: name
dtype: string
- name: page_no
dtype: int64
- name: width
dtype: int64
- name: height
dtype: int64
- name: text
sequence: string
- name: bbox
sequence:
sequence: int64
- name: segment_bbox
sequence:
sequence: int64
- name: segment_id
sequence: int64
- name: qas
struct:
- name: answers
list:
- name: answer_end
sequence: int64
- name: answer_start
sequence: int64
- name: text
sequence: string
- name: question
sequence: string
- name: question_id
sequence: int64
- name: image
dtype: string
- name: md5sum
dtype: string
splits:
- name: train
num_bytes: 235094122
num_examples: 72
- name: test
num_bytes: 65985649
num_examples: 20
download_size: 300115943
dataset_size: 301079771
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
---
dataset_info:
features:
- name: 名称
dtype: 字符串
- name: 页码
dtype: int64
- name: 宽度
dtype: int64
- name: 高度
dtype: int64
- name: 文本
sequence: 字符串
- name: 边界框(bbox)
sequence:
sequence: int64
- name: 段边界框(segment_bbox)
sequence:
sequence: int64
- name: 段ID(segment_id)
sequence: int64
- name: 问答对(qas)
struct:
- name: 答案列表
list:
- name: 答案结束位置
sequence: int64
- name: 答案开始位置
sequence: int64
- name: 答案文本
sequence: 字符串
- name: 问题
sequence: 字符串
- name: 问题ID
sequence: int64
- name: 图片路径
dtype: 字符串
- name: MD5校验和
dtype: 字符串
splits:
- name: 训练集
num_bytes: 235094122
num_examples: 72
- name: 测试集
num_bytes: 65985649
num_examples: 20
download_size: 300115943
dataset_size: 301079771
configs:
- config_name: 默认配置
data_files:
- split: 训练集
path: data/train-*
- split: 测试集
path: data/test-*
提供机构:
EphronM
原始信息汇总
数据集概述
数据集特征
- name: 字符串类型
- page_no: 整数类型
- width: 整数类型
- height: 整数类型
- text: 字符串序列
- bbox: 整数序列的序列
- segment_bbox: 整数序列的序列
- segment_id: 整数序列
- qas: 结构体,包含:
- answers: 列表,包含:
- answer_end: 整数序列
- answer_start: 整数序列
- text: 字符串序列
- question: 字符串序列
- question_id: 整数序列
- answers: 列表,包含:
- image: 字符串类型
- md5sum: 字符串类型
数据集分割
- train: 72个样本,占用235094122字节
- test: 20个样本,占用65985649字节
数据集大小
- 下载大小: 300115943字节
- 数据集大小: 301079771字节
数据文件配置
- default 配置:
- train: 路径为
data/train-* - test: 路径为
data/test-*
- train: 路径为



