lightonai/LightOnOCR-bbox-bench
收藏Hugging Face2026-01-23 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/lightonai/LightOnOCR-bbox-bench
下载链接
链接失效反馈官方服务:
资源简介:
LightOnOCR-bbox-bench是一个用于评估视觉语言模型(VLMs)在文档中定位图像能力的基准数据集。该数据集包含两个子集:arxiv(565个科学论文样本)和olmocr_bench(290个多样化文档样本)。每个样本包含1-5个需要定位的图像,真实边界框被归一化到0-1000的坐标空间。任务要求模型在给定文档页(PDF)的情况下,预测图像(如图表、照片等)的边界框。数据集来源于arXiv科学论文和allenai/olmOCR-bench,用于评估模型的空间理解能力和区分视觉内容与文本的能力。
LightOnOCR-bbox-bench is an evaluation benchmark for assessing the ability of vision-language models (VLMs) to localize images within documents using bounding boxes. The dataset consists of two subsets: arxiv (565 samples from scientific papers) and olmocr_bench (290 samples from diverse document types). Each sample contains 1-5 images to localize, with ground truth bounding boxes normalized to a 0-1000 coordinate space. The task requires the model to predict bounding boxes around images (figures, charts, photographs, etc.) given a document page (PDF). The dataset is sourced from arXiv scientific papers and allenai/olmOCR-bench, and is used to evaluate the models spatial understanding and ability to distinguish visual content from text.
提供机构:
lightonai



