sitloboi2012/CMDS_Multimodal_Document
收藏Hugging Face2023-10-01 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/sitloboi2012/CMDS_Multimodal_Document
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- image-classification
- text-classification
- image-to-text
language:
- bg
tags:
- DocumentAI
- ImageClassification
- SequenceClassification
pretty_name: CMDS Document Images Dataset
size_categories:
- n<1K
---
# Dataset Card for Cyrillic Multimodel Document (CMDS)
This is the dataset consists of 3789 pairs of images and text across 31 categories downloaded from the Bulgarian ministry of finance
### Dataset Summary
This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1).
### Supported Tasks and Leaderboards
Uses this dataset for downstream task like Document Classification, Image Classification or Text Classification (Sequences Classification). Suitable for multimodal Model like LayoutLm Family, Donut, etc.
### Languages
Bulgarian
### Data Fields
- __text__ (bytes): the text appear in the document
- __filename__ (str): the name of the file
- __image__ (PIL.Image): the image of the document
- __label__ (str): the label of the document. There are 31 differences labels
提供机构:
sitloboi2012
原始信息汇总
CMDS Document Images Dataset
数据集概述
该数据集包含3789对图像和文本,涵盖31个类别,数据来源于保加利亚财政部。
支持的任务和排行榜
适用于文档分类、图像分类或文本分类(序列分类)等下游任务。适合多模态模型,如LayoutLm系列、Donut等。
语言
保加利亚语
数据字段
- text (bytes): 文档中的文本
- filename (str): 文件名
- image (PIL.Image): 文档图像
- label (str): 文档标签,共有31个不同的标签



