jp1924/OCRDataPublic-inst
收藏Hugging Face2024-05-10 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/jp1924/OCRDataPublic-inst
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: image
dtype: image
- name: meta_data
struct:
- name: object_recognition
dtype: int32
- name: text_language
dtype: int32
- name: category
dtype: int32
- name: identifier
dtype: string
- name: label_path
dtype: string
- name: name
dtype: string
- name: src_path
dtype: string
- name: type
dtype: string
- name: acquisition_location
dtype: int32
- name: data_captured
dtype: string
- name: dpi
dtype: int32
- name: group
dtype: int32
- name: height
dtype: int32
- name: width
dtype: int32
- name: writing_style
dtype: int32
- name: year
dtype: int32
- name: objects
list:
- name: id
dtype: int32
- name: text
dtype: string
- name: bbox
list: int32
- name: meta
struct:
- name: type
dtype: string
- name: text_type
dtype: string
- name: context
dtype: string
- name: question
sequence: string
- name: answer
sequence: string
splits:
- name: train
num_bytes: 14019183278.0
num_examples: 40000
- name: validation
num_bytes: 1066622883.0
num_examples: 3000
download_size: 10970846438
dataset_size: 15085806161.0
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
---
提供机构:
jp1924
原始信息汇总
数据集概述
数据集特征
- image: 图像数据类型。
- meta_data: 结构化数据,包含以下字段:
- object_recognition: int32
- text_language: int32
- category: int32
- identifier: string
- label_path: string
- name: string
- src_path: string
- type: string
- acquisition_location: int32
- data_captured: string
- dpi: int32
- group: int32
- height: int32
- width: int32
- writing_style: int32
- year: int32
- objects: 列表数据,包含以下字段:
- id: int32
- text: string
- bbox: 列表,包含int32类型数据
- meta: 结构化数据,包含以下字段:
- type: string
- text_type: string
- context: string
- question: 序列,包含string类型数据
- answer: 序列,包含string类型数据
数据集分割
- train: 40000个样本,总大小为14019183278字节。
- validation: 3000个样本,总大小为1066622883字节。
数据集大小
- 下载大小: 10970846438字节
- 数据集总大小: 15085806161字节
数据文件配置
- default 配置:
- train: 路径为
data/train-* - validation: 路径为
data/validation-*
- train: 路径为



