sitloboi2012/CMDS_Multimodal_Document

Name: sitloboi2012/CMDS_Multimodal_Document
Creator: sitloboi2012
Published: 2023-10-01 16:03:24
License: 暂无描述

Hugging Face2023-10-01 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/sitloboi2012/CMDS_Multimodal_Document

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - image-classification - text-classification - image-to-text language: - bg tags: - DocumentAI - ImageClassification - SequenceClassification pretty_name: CMDS Document Images Dataset size_categories: - n<1K --- # Dataset Card for Cyrillic Multimodel Document (CMDS) This is the dataset consists of 3789 pairs of images and text across 31 categories downloaded from the Bulgarian ministry of finance ### Dataset Summary This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1). ### Supported Tasks and Leaderboards Uses this dataset for downstream task like Document Classification, Image Classification or Text Classification (Sequences Classification). Suitable for multimodal Model like LayoutLm Family, Donut, etc. ### Languages Bulgarian ### Data Fields - __text__ (bytes): the text appear in the document - __filename__ (str): the name of the file - __image__ (PIL.Image): the image of the document - __label__ (str): the label of the document. There are 31 differences labels

提供机构：

sitloboi2012

原始信息汇总

CMDS Document Images Dataset

数据集概述

该数据集包含3789对图像和文本，涵盖31个类别，数据来源于保加利亚财政部。

支持的任务和排行榜

适用于文档分类、图像分类或文本分类（序列分类）等下游任务。适合多模态模型，如LayoutLm系列、Donut等。

语言

保加利亚语

数据字段

text (bytes): 文档中的文本
filename (str): 文件名
image (PIL.Image): 文档图像
label (str): 文档标签，共有31个不同的标签

5,000+

优质数据集

54 个

任务类型

进入经典数据集