five

lightonai/LightOnOCR-bbox-bench

收藏
Hugging Face2026-01-23 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/lightonai/LightOnOCR-bbox-bench
下载链接
链接失效反馈
官方服务:
资源简介:
LightOnOCR-bbox-bench是一个用于评估视觉语言模型(VLMs)在文档中定位图像能力的基准数据集。该数据集包含两个子集:arxiv(565个科学论文样本)和olmocr_bench(290个多样化文档样本)。每个样本包含1-5个需要定位的图像,真实边界框被归一化到0-1000的坐标空间。任务要求模型在给定文档页(PDF)的情况下,预测图像(如图表、照片等)的边界框。数据集来源于arXiv科学论文和allenai/olmOCR-bench,用于评估模型的空间理解能力和区分视觉内容与文本的能力。

LightOnOCR-bbox-bench is an evaluation benchmark for assessing the ability of vision-language models (VLMs) to localize images within documents using bounding boxes. The dataset consists of two subsets: arxiv (565 samples from scientific papers) and olmocr_bench (290 samples from diverse document types). Each sample contains 1-5 images to localize, with ground truth bounding boxes normalized to a 0-1000 coordinate space. The task requires the model to predict bounding boxes around images (figures, charts, photographs, etc.) given a document page (PDF). The dataset is sourced from arXiv scientific papers and allenai/olmOCR-bench, and is used to evaluate the models spatial understanding and ability to distinguish visual content from text.
提供机构:
lightonai
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作