five

sitloboi2012/CMDS_Multimodal_Document

收藏
Hugging Face2023-10-01 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/sitloboi2012/CMDS_Multimodal_Document
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - image-classification - text-classification - image-to-text language: - bg tags: - DocumentAI - ImageClassification - SequenceClassification pretty_name: CMDS Document Images Dataset size_categories: - n<1K --- # Dataset Card for Cyrillic Multimodel Document (CMDS) This is the dataset consists of 3789 pairs of images and text across 31 categories downloaded from the Bulgarian ministry of finance ### Dataset Summary This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1). ### Supported Tasks and Leaderboards Uses this dataset for downstream task like Document Classification, Image Classification or Text Classification (Sequences Classification). Suitable for multimodal Model like LayoutLm Family, Donut, etc. ### Languages Bulgarian ### Data Fields - __text__ (bytes): the text appear in the document - __filename__ (str): the name of the file - __image__ (PIL.Image): the image of the document - __label__ (str): the label of the document. There are 31 differences labels
提供机构:
sitloboi2012
原始信息汇总

CMDS Document Images Dataset

数据集概述

该数据集包含3789对图像和文本,涵盖31个类别,数据来源于保加利亚财政部。

支持的任务和排行榜

适用于文档分类、图像分类或文本分类(序列分类)等下游任务。适合多模态模型,如LayoutLm系列、Donut等。

语言

保加利亚语

数据字段

  • text (bytes): 文档中的文本
  • filename (str): 文件名
  • image (PIL.Image): 文档图像
  • label (str): 文档标签,共有31个不同的标签
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作