tech4humans/br-doc-extraction
收藏Hugging Face2025-05-11 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/tech4humans/br-doc-extraction
下载链接
链接失效反馈官方服务:
资源简介:
巴西文档结构提取数据集包含1218张巴西身份证件(CNH - 国家驾驶执照,RG - 一般注册)和发票(NF - 财政发票)的图像。每张图像都配有一个用户定义的JSON模式(作为“前缀”)和相应的结构化数据提取(作为JSON字符串格式的“后缀”)。该数据集的主要目标是帮助微调视觉语言模型,以从各种巴西文档图像中提取结构化信息。
The Brazilian Document Structure Extraction dataset contains 1218 images of Brazilian identification documents (CNH - National Drivers License, RG - General Registration) and invoices (NF - Nota Fiscal). Each image is paired with a user-defined JSON schema (as a prefix) and the corresponding structured data extraction (as a suffix in JSON string format). The primary goal of this dataset is to facilitate the fine-tuning of Vision-Language Models (VLMs) for extracting structured information from diverse Brazilian document images.
提供机构:
tech4humans



