Salesforce/blip3-ocr-200m
收藏Hugging Face2025-02-03 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/Salesforce/blip3-ocr-200m
下载链接
链接失效反馈官方服务:
资源简介:
BLIP3-OCR-200M数据集旨在解决当前视觉语言模型(VLMs)在处理和解释富含文本的图像(如文档和图表)时的局限性。该数据集在预训练阶段集成了光学字符识别(OCR)数据,通过提供详细的文本信息来增强视觉与语言的对齐。数据集专注于富含文本的图像,并包含OCR特定的注释,使得处理文档、图表等文本密集内容更加准确。数据集以Parquet格式存储,便于高效存储、处理和检索OCR元数据和图像。该数据集的主要目标是提高VLMs在涉及复杂文本丰富图像任务中的跨模态推理能力。
The BLIP3-OCR-200M dataset is designed to enhance Vision-Language Models (VLMs) by incorporating Optical Character Recognition (OCR) data during the pre-training phase. The dataset focuses on text-rich images and includes OCR-specific annotations, stored in Parquet format. It contains approximately 2 million samples organized into 50 Parquet files, each with detailed metadata and captions. The dataset aims to improve cross-modality reasoning capabilities of VLMs by enriching the pre-training datasets with detailed textual information. The OCR annotations are provided at 12 different levels of granularity, allowing researchers to explore varying degrees of detail in the text content extracted from images.
提供机构:
Salesforce
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



