fimu-docproc-research/dataset_easy_ocr_v0.3.0_multipage_cleaned
收藏Hugging Face2023-06-27 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/fimu-docproc-research/dataset_easy_ocr_v0.3.0_multipage_cleaned
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: id
dtype: string
- name: words
sequence: string
- name: bboxes
sequence:
sequence: float32
- name: image_path
dtype: string
- name: ner_tags
sequence:
class_label:
names:
'0': DIC
'1': IBAN
'2': ICO
'3': O
'4': account_number
'5': bank_code
'6': const_symbol
'7': contr_address
'8': contr_name
'9': due_date
'10': invoice_date
'11': invoice_number
'12': qr_code
'13': spec_symbol
'14': total_amount
'15': var_symbol
splits:
- name: train
num_bytes: 24111780
num_examples: 2897
- name: val
num_bytes: 2718925
num_examples: 321
download_size: 7947488
dataset_size: 26830705
---
# Dataset Card for "dataset_easy_ocr_v0.3.0_multipage_cleaned"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
fimu-docproc-research
原始信息汇总
数据集概述
数据集名称
- 名称: dataset_easy_ocr_v0.3.0_multipage_cleaned
数据集特征
- id: 字符串类型
- words: 字符串序列
- bboxes: 浮点数32序列
- image_path: 字符串类型
- ner_tags: 类别标签序列,包含以下类别:
- DIC
- IBAN
- ICO
- O
- account_number
- bank_code
- const_symbol
- contr_address
- contr_name
- due_date
- invoice_date
- invoice_number
- qr_code
- spec_symbol
- total_amount
- var_symbol
数据集分割
- 训练集:
- 示例数量: 2897
- 存储大小: 24111780字节
- 验证集:
- 示例数量: 321
- 存储大小: 2718925字节
数据集大小
- 下载大小: 7947488字节
- 数据集总大小: 26830705字节



