大模型目标检测微调数据集

Name: 大模型目标检测微调数据集
Creator: maas
Published: 2026-05-21 13:48:04
License: 暂无描述

魔搭社区2026-05-21 更新2025-05-10 收录

下载链接：

https://modelscope.cn/datasets/Tina12345/textVQA_groundingtask_bbox

下载链接

链接失效反馈

官方服务：

资源简介：

# TextVQA validation set with grounding truth bounding box The dataset used in the paper [MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs](https://arxiv.org/pdf/2502.17422) for studying MLLMs' attention patterns. The dataset is sourced from [TextVQA](https://textvqa.org/dataset/) and annotated **manually** with ground-truth bounding boxes. We consider questions with a single area of interest in the image so that 4370 out of 5000 samples are kept. ## Citation If you find our paper and code useful for your research and applications, please cite using this BibTeX: ``` @article{zhang2025mllms, title={MLLMs know where to look: Training-free perception of small visual details with multimodal LLMs}, author={Zhang, Jiarui and Khayatkhoei, Mahyar and Chhikara, Prateek and Ilievski, Filip}, journal={arXiv preprint arXiv:2502.17422}, year={2025} } ```

# 带真值边界框的TextVQA验证集本数据集用于论文《MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs》（arXiv:2502.17422），以研究多模态大语言模型（Multimodal Large Language Model, MLLM）的注意力模式。该数据集源自[TextVQA](https://textvqa.org/dataset/)，并经人工手动标注了真值边界框（ground-truth bounding box）。我们筛选出图像中仅包含单一感兴趣区域的问题，最终从5000个样本中保留了4370个。 ## 引用说明若您的研究与应用工作中用到了本文及代码，请通过以下BibTeX格式进行引用： @article{zhang2025mllms, title={MLLMs know where to look: Training-free perception of small visual details with multimodal LLMs}, author={Zhang, Jiarui and Khayatkhoei, Mahyar and Chhikara, Prateek and Ilievski, Filip}, journal={arXiv preprint arXiv:2502.17422}, year={2025} }

提供机构：

maas

创建时间：

2025-05-08

搜集汇总

数据集介绍