fimu-docproc-research/dataset_easy_ocr_v0.3.0_clean
收藏Hugging Face2023-06-17 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/fimu-docproc-research/dataset_easy_ocr_v0.3.0_clean
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: id
dtype: string
- name: words
sequence: string
- name: bboxes
sequence:
sequence: float32
- name: image_path
dtype: string
- name: ner_tags
sequence:
class_label:
names:
'0': DIC
'1': IBAN
'2': ICO
'3': O
'4': account_number
'5': bank_code
'6': const_symbol
'7': contr_address
'8': contr_name
'9': due_date
'10': invoice_date
'11': invoice_number
'12': qr_code
'13': spec_symbol
'14': total_amount
'15': var_symbol
splits:
- name: train
num_bytes: 28030910
num_examples: 3212
- name: val
num_bytes: 3166612
num_examples: 356
download_size: 9291114
dataset_size: 31197522
---
# Dataset Card for "dataset_easy_ocr_v0.3.0_clean"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
fimu-docproc-research
原始信息汇总
数据集概述
数据集名称
- 名称: dataset_easy_ocr_v0.3.0_clean
数据集特征
- id: 字符串类型
- words: 字符串序列类型
- bboxes: 浮点数序列类型
- image_path: 字符串类型
- ner_tags: 类别标签序列类型,包含以下类别:
- DIC
- IBAN
- ICO
- O
- account_number
- bank_code
- const_symbol
- contr_address
- contr_name
- due_date
- invoice_date
- invoice_number
- qr_code
- spec_symbol
- total_amount
- var_symbol
数据集分割
- 训练集:
- 样本数: 3212
- 数据大小: 28030910 字节
- 验证集:
- 样本数: 356
- 数据大小: 3166612 字节
数据集大小
- 下载大小: 9291114 字节
- 总数据大小: 31197522 字节



