lightonai/LightOnOCR-bbox-bench

Name: lightonai/LightOnOCR-bbox-bench
Creator: lightonai
Published: 2026-01-23 13:06:52
License: 暂无描述

Hugging Face2026-01-23 更新2026-02-07 收录

下载链接：

https://hf-mirror.com/datasets/lightonai/LightOnOCR-bbox-bench

下载链接

链接失效反馈

官方服务：

资源简介：

LightOnOCR-bbox-bench是一个用于评估视觉语言模型（VLMs）在文档中定位图像能力的基准数据集。该数据集包含两个子集：arxiv（565个科学论文样本）和olmocr_bench（290个多样化文档样本）。每个样本包含1-5个需要定位的图像，真实边界框被归一化到0-1000的坐标空间。任务要求模型在给定文档页（PDF）的情况下，预测图像（如图表、照片等）的边界框。数据集来源于arXiv科学论文和allenai/olmOCR-bench，用于评估模型的空间理解能力和区分视觉内容与文本的能力。

LightOnOCR-bbox-bench is an evaluation benchmark for assessing the ability of vision-language models (VLMs) to localize images within documents using bounding boxes. The dataset consists of two subsets: arxiv (565 samples from scientific papers) and olmocr_bench (290 samples from diverse document types). Each sample contains 1-5 images to localize, with ground truth bounding boxes normalized to a 0-1000 coordinate space. The task requires the model to predict bounding boxes around images (figures, charts, photographs, etc.) given a document page (PDF). The dataset is sourced from arXiv scientific papers and allenai/olmOCR-bench, and is used to evaluate the models spatial understanding and ability to distinguish visual content from text.

提供机构：

lightonai

5,000+

优质数据集

54 个

任务类型

进入经典数据集